Cherry-pick gpcontrib/diskqouta and other recent main changes.#1616
Cherry-pick gpcontrib/diskqouta and other recent main changes.#1616reshke merged 346 commits intoapache:REL_2_STABLEfrom
Conversation
This PR is trying to fix CI building failure by ignoring distribution notice in CTAS statements. Co-authored-by: Hao Zhang <hzhang2@vmware.com>
…full (apache#112) * Fix bug The relation_cache_entry of temporary table, created during vacuum full, is not be removed after vacuum full. This table will be treated as an uncommitted table, although it has been dropped after vacuum full. And its table size still remains in diskquota.table_size, which causes quota size to be larger than real status. Use RelidByRelfilenode() to check whether the table is committed, and remove its relation_cache_entry. Co-authored-by: hzhang2 <hzhang2@vmware.com> Co-authored-by: Xing Guo <higuoxing+github@gmail.com> Co-authored-by: Xuebin Su (苏学斌) <sxuebin@vmware.com>
The SQL statement is not equal to the expected output in test_vacuum.sql. Co-authored-by: hzhang2 <hzhang2@vmware.com>
…e index on ao_table` is committed (apache#113) We can not calculate the size of pg_aoblkdir_xxxx before `create index on ao_table` is committed. 1. Lack of the ability to parse the name of pg_aoblkdir_xxxx. 2. pg_aoblkdir_xxxx is created by `create index on ao_table`, which can not be searched by diskquota_get_appendonly_aux_oid_list() before index's creation. Solution: 1. parse the name begin with `pg_aoblkdir`. 2. When blkdirrelid is missing, we try to fetch it by traversing relation_cache. Co-authored-by: hzhang2 <hzhang2@vmware.com> Co-authored-by: Xing Guo <higuoxing+github@gmail.com> Co-authored-by: Xuebin Su (苏学斌) <sxuebin@vmware.com>
Co-authored-by: Xuebin Su (苏学斌) <12034000+xuebinsu@users.noreply.github.com> Co-authored-by: Xing Guo <higuoxing@gmail.com> Co-authored-by: Hao Zhang <hzhang2@vmware.com>
Consider a user session that does a DELETE followed by a VACUUM FULL to reclaim the disk space. If, at the same time, the bgworker loads config by doing a SELECT, and the SELECT begins before the DELETE ends, while ends after the VACUUM FULL begins: bgw: ---------[ SELECT ]-----------> usr: ---[ DELETE ]-[ VACUUM FULL ]--> then the tuples deleted will be marked as RECENTLY_DEAD instead of DEAD. As a result, the deleted tuples cannot be removed by VACUUM FULL. The fix lets the user session wait for the bgworker to finish the current SELECT before starting VACUUM FULL .
…ache#116) When doing VACUUM FULL, The table size may not be updated if the table's oid is pulled before its relfilenode is swapped. This fix keeps the table's oid in the shared memory if the table is being altered, i.e., is locked in ACCESS EXCLUSIVE mode. Co-authored-by: Xuebin Su <sxuebin@vmware.com>
Currently, diskquota.pause() only takes effect on quota checking. Bgworkers still go over the loop to refreshing quota even if diskquota is paused. This wastes computation resources and can cause flaky issues. This fix makes bgworkers skip refreshing quota when the user pauses diskquota entirely to avoid those issues. Table sizes can be updated correctly after resume.
ci: create rhel8 release build. Signed-off-by: Sasasu <i@sasa.su> Co-authored-by: Xuebin Su <sxuebin@vmware.com>
Currently, deadlock can occur when 1. A user session is doing DROP EXTENSION, and 2. A bgworker is loading quota configs using SPI. This patch fixes the issue by pausing diskquota before DROP EXTENSION so that the bgworker will not load config anymore. Note that this cannot be done using object_access_hook() because the extension object is dropped AFTER dropping all tables that belong to the extension.
Test case test_primary_failure will stop/start segment to produce a mirror switch. But the segment start could fail while replaying xlog. The failure was caused by the deleted tablespace directories in previous test cases. This commit removes the "rm" statement in those tablespace test cases and add "-p" to the "mkdir" command line. The corresponding sub-directories will be deleted by "DROP TABLESPACE" if the case passes. Relevant logs: 2022-02-08 10:09:30.458183 CST,,,p1182584,th1235613568,,,,0,,,seg1,,,,,"LOG","00000","entering standby mode",,,,,,,0,,"xlog.c",6537, 2022-02-08 10:09:30.458670 CST,,,p1182584,th1235613568,,,,0,,,seg1,,,,,"LOG","00000","redo starts at E/24638A28",,,,,,,0,,"xlog.c",7153, 2022-02-08 10:09:30.468323 CST,"cc","postgres",p1182588,th1235613568,"[local]",,2022-02-08 10:09:30 CST,0,,,seg1,,,,,"FATAL","57P03","the database system is starting up" ,"last replayed record at E/2481EA70",,,,,,0,,"postmaster.c",2552, 2022-02-08 10:09:30.484792 CST,,,p1182584,th1235613568,,,,0,,,seg1,,,,,"FATAL","58P01","directory ""/tmp/test_spc"" does not exist",,"Create this directory for the table space before restarting the server.",,,"xlog redo create tablespace: 2590660 ""/tmp/test_spc""",,0,,"tablespace.c",749,
Otherwise, compiler reports a warning: "comparison of constant ‘20’ with boolean expression is always false"
Each time the state of Diskquota is changed, we need to wait for the change to take effect using diskquota.wait_for_worker_new_epoch(). However, when the bgworker is not alive, such wait can last forever. This patch fixes the issue by adding a timeout GUC so that wait() will throw an NOTICE if it times out, making it more user-friendly. To fix a race condition when CREATE EXTENSION, the user needs to SELECT wait_for_worker_new_epoch() manually before writing data. This is to wait until the current database is added to the monitored db cache so that active tables in the current database can be recorded. This patch also fix test script for activating standby and rename some of the cases to make them more clear.
- Change to use GUC to set hardlimit instead of UDF. Since the hardlimit setting needs to be persistent after postmaster restarting. - Fix relevant test cases.
Currently, the Diskquota launcher first starts a worker, then creates the worker entry. However, after the worker starts, it cannot find the entry when trying to check the is_paused status. Also, after a GPDB restart, when the QD checks whether the worker is running by checking the epoch, it might also fail to find the entry. This patch fixes the issue by first create the worker entry then starting the bgworker process.
The db cache stores which databases enables diskquota. Active tables will be recorded only if they are in those databases. Previously, we created a new UDF update_diskquota_db_list() to add the current db to the cache. However, the UDF is install in a wrong database. As a result, after the user upgrade from a previous version to 1.0.3, the bgworker does not find the UDF and can do nothing. This patch fixes the issue by removing update_diskquota_db_list() and using fetch_table_stat() to update db cache. fetch_table_stat() already exists since version 1.0.0 so that no new UDF is needed. This PR is to replace PR apache#99 , and depends on PR apache#130 to fix a race condition that occurs after CREATE EXTENSION.
This is a code defect of a original GPDB on resource group enabled. There is a bug to calculate length of pg_wchar in `gpvars_check_gp_resource_group_cgroup_parent` function. For example, the value "greenplum database" was originally supposed to be judged as an illegal name, and report error: "gp_resource_group_cgroup_parent can only contains alphabet, number and non-leading . _ -". But it was wrongly judged as legal.
Use absolute artifact paths in the GPG verification step of
devops/release/cloudberry-release.sh.
Previously, the script verified SHA-512 using an absolute path but
called `gpg --verify` with relative file names. When running with
`--repo` from a different working directory, this could fail with
"No such file or directory" even though the `.asc` file existed in
the artifacts directory.
This change aligns the GPG verify command with the SHA-512 check by
verifying:
$ARTIFACTS_DIR/${TAR_NAME}.asc
against:
$ARTIFACTS_DIR/$TAR_NAME
No behavior change for successful local runs besides making path
resolution robust.
Add GUC_GPDB_NEED_SYNC flag to pax.enable_sparse_filter and pax.enable_row_filter so their values are dispatched from QD to QE segments. Without this flag, SET on the coordinator has no effect because scans run on QE segments.
…scoding COPY FROM with SEGMENT REJECT LIMIT had two bugs when encountering invalid multi-byte encoding sequences: 1. Encoding errors were double-counted: HandleCopyError() incremented rejectcount, then RemoveInvalidDataInBuf() incremented it again for the same error. This caused the reject limit to be reached twice as fast as expected. 2. SREH (Single Row Error Handling) was completely disabled when transcoding was required (file encoding != database encoding). Any encoding error during transcoding would raise an ERROR instead of skipping the bad row. Fix by removing the duplicate rejectcount++ from RemoveInvalidDataInBuf(), removing the !need_transcoding guard that blocked SREH for transcoding, and adding proper buffer cleanup for the transcoding case (advance raw_buf past the bad line using FindEolInUnverifyRawBuf). Add regression tests covering both non-transcoding (invalid UTF-8) and transcoding (invalid EUC_CN to UTF-8) cases with various reject limits. Fixes apache#1425
src/test/regress/sql/misc.sql is generated by src/test/regress/input/misc.souce, it should not be add in sql directory.
macOS BSD sed requires an explicit empty string argument after -i (sed -i '' 'script' file), unlike GNU sed which takes -i without a suffix argument. Without this fix, BSD sed misinterprets the sed script as a backup suffix and treats the filename as the script, causing "unterminated substitute pattern" error.
Previously, GetViewBaseRelids() rejected any query with more than one base table, so materialized views defined with JOINs were never registered in gp_matview_aux/gp_matview_tables. This meant no status tracking and no staleness propagation for join matviews. Add a recursive helper extract_base_relids_from_jointree() that walks RangeTblRef, JoinExpr, and FromExpr nodes to collect all base relation OIDs. This is the only C function changed -- the existing downstream infrastructure (InsertMatviewTablesEntries, SetRelativeMatviewAuxStatus, MaintainMaterializedViewStatus, reference counting) already supports N base tables per matview. This is a first step toward AQUMV support for join queries. Users can also inspect a join matview's freshness status manually via gp_matview_aux. Key behaviors: - Self-joins (t1 JOIN t1) are deduplicated to one catalog entry - All join types supported: INNER, LEFT, RIGHT, FULL, implicit cross - Subquery/function RTEs in FROM are still rejected - Partitioned tables in joins propagate DML status correctly - Status escalation across multiple base tables works (i→e on delete) - Transaction rollback correctly reverts status changes Includes regression tests for: two/three-table joins, implicit joins, self-joins, all outer join types, mixed join types, join with GROUP BY, shared base tables across multiple MVs, multi-DML transactions, transaction rollback, cross joins, partitioned tables in joins, VACUUM FULL, TRUNCATE, WITH NO DATA, and DROP CASCADE.
ADD_DEFINITIONS(-DRUN_GTEST) and ADD_DEFINITIONS(-DRUN_GBENCH)
are directory-scoped CMake commands that apply to ALL targets,
including the production pax shared library. This caused test-
only macros to be defined in production builds.
In pax_porc_adpater.cc, the leaked RUN_GTEST activates:
expect_hdr = rel_tuple_desc_->attrs[index].attlen == -1 &&
rel_tuple_desc_->attrs[index].attbyval == false;
#ifdef RUN_GTEST
expect_hdr = false;
#endif
This forces expect_hdr to false in production, skipping the
stripping of PostgreSQL varlena headers from dictionary
entries. As a result, dictionary-encoded string columns
return garbled data (varlena header bytes are included as
part of the string content).
Replace ADD_DEFINITIONS with target_compile_definitions
scoped to test_main and bench_main targets only, so
RUN_GTEST and RUN_GBENCH are no longer defined when
building pax.so.
Oid is unsigned int. Therefore, when the Oid reaches 2^31, printing it with %d will display a negative value. This is a defect of the original GPDB. GPDB has fixed similar defects on commit 7279a1e('Fix getResUsage integer overflow'), but there are still omissions.
…gn partitions (apache#1524) The storage type detection logic failed to properly identify mixed storage when foreign and non-foreign partitions coexisted, leading to incorrect metadata that could cause issues with scan type selection and query planning.
When ALTER TABLE ... SET WITH (reorganize=true) runs concurrently with COPY TO, COPY may return 0 rows instead of all rows. The root cause is a snapshot/lock ordering problem: PortalRunUtility() pushes the active snapshot before calling DoCopy(), so the snapshot predates any concurrent reorganize that had not yet committed. After COPY TO blocks on AccessExclusiveLock and the reorganize commits, the stale snapshot cannot see the new physical files (xmin = reorganize_xid is invisible) while the old physical files have already been removed, yielding 0 rows. Three code paths are fixed: 1. Relation-based COPY TO (copy.c, DoCopy): After table_openrv() acquires AccessShareLock — which blocks until any concurrent reorganize commits — pop and re-push the active snapshot so it reflects all committed data at lock-grant time. 2. Query-based COPY TO, RLS COPY TO, and CTAS (copyto.c, BeginCopy): After pg_analyze_and_rewrite() -> AcquireRewriteLocks() acquires all direct relation locks, refresh the snapshot. This covers COPY (SELECT ...) TO, COPY on RLS-protected tables (internally rewritten to a query), and CREATE TABLE AS SELECT. 3. Partitioned table COPY TO (copy.c, DoCopy): Before entering BeginCopy, call find_all_inheritors() to eagerly acquire AccessShareLock on all child partitions. Child partition locks are normally acquired later in ExecutorStart -> ExecInitAppend, after PushCopiedSnapshot has already embedded a stale snapshot. Locking all children upfront ensures the snapshot refresh in fixes 1 and 2 covers all concurrent child-partition reorganize commits. In REPEATABLE READ or SERIALIZABLE isolation, GetTransactionSnapshot() returns the same transaction-level snapshot, so the Pop/Push is a harmless no-op. Tests added: - src/test/isolation2/sql/copy_to_concurrent_reorganize.sql Tests 2.1-2.5 for relation-based, query-based, partitioned, RLS, and CTAS paths across heap, AO row, and AO column storage. - contrib/pax_storage/src/test/isolation2/sql/pax/ copy_to_concurrent_reorganize.sql Same coverage for PAX columnar storage. See: Issue#1545 <apache#1545>
For RC tags like X.Y.Z-incubating-rcN, generate the source tarball filename and top-level directory using BASE_VERSION (without -rcN). This keeps the voted bits ready for promotion without rebuilding and avoids -rcN showing up in the extracted source directory.
This commit introduces comprehensive support for Ubuntu 24.04 (Noble Numbat) across build environments and packaging metadata. Key changes and package updates for Ubuntu 24.04: - Compiler Upgrade: Migrated from GCC/G++ 11 to GCC/G++ 13 to align with Noble's default toolchain. - Python 3.12 Migration: Updated system Python to 3.12. Removed python3-distutils as it has been deprecated and removed from Ubuntu 24.04 repositories (PEP 632). - t64 Transition: Updated DEB runtime dependencies to include the 't64' suffix (e.g., libssl3t64, libapr1t64, libcurl4t64) to comply with Noble's mandatory 64-bit time_t ABI transition. - libcgroup Update: Switched from libcgroup1 to libcgroup2 to match the updated library names in Ubuntu 24.04. - PIP Compliance: Added --break-system-packages flag for PIP installations within the Dockerfile to satisfy PEP 668 requirements.
…t but we need them when re-redoing some tablespace related xlogs (e.g. database create with a tablespace) on mirror." This reverts commit 7a09e80.
Crash recovery on standby may encounter missing directories
when replaying database-creation WAL records. Prior to this
patch, the standby would fail to recover in such a case;
however, the directories could be legitimately missing.
Consider the following sequence of commands:
CREATE DATABASE
DROP DATABASE
DROP TABLESPACE
If, after replaying the last WAL record and removing the
tablespace directory, the standby crashes and has to replay the
create database record again, crash recovery must be able to continue.
A fix for this problem was already attempted in 49d9cfc, but it
was reverted because of design issues. This new version is based
on Robert Haas' proposal: any missing tablespaces are created
during recovery before reaching consistency. Tablespaces
are created as real directories, and should be deleted
by later replay. CheckRecoveryConsistency ensures
they have disappeared.
The problems detected by this new code are reported as PANIC,
except when allow_in_place_tablespaces is set to ON, in which
case they are WARNING. Apart from making tests possible, this
gives users an escape hatch in case things don't go as planned.
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Asim R Praveen <apraveen@pivotal.io>
Author: Paul Guo <paulguo@gmail.com>
Reviewed-by: Anastasia Lubennikova <lubennikovaav@gmail.com> (older versions)
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> (older versions)
Reviewed-by: Michaël Paquier <michael@paquier.xyz>
Diagnosed-by: Paul Guo <paulguo@gmail.com>
Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
On FreeBSD, the new test fails due to a WAL file being removed before the standby has had the chance to copy it. Fix by adding a replication slot to prevent the removal until after the standby has connected. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reported-by: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAEze2Wj5nau_qpjbwihvmXLfkAWOZ5TKdbnqOc6nKSiRJEoPyQ@mail.gmail.com
(cherry picked from commit 5794f50416361e8528ae5dfea269eee50261a741) Co-authored-by: Leonid <63977577+leborchuk@users.noreply.github.com>
|
Very Cool! Thanks! Due to many commits in this PR, we need to merge this PR via the CLI. I can help with this. (Steps need to be added here.) |
Hi! |
Hi @reshke, Thanks for raising this. The main reason I suggested merging via CLI is that this PR contains many commits, and GitHub has limitations when handling large PRs (100+ commits) via the UI. For example, in another PR I recently helped with — #1547 — we also had to merge via CLI for the same reason. That said, once another committer approves this PR, we can try the UI merge option first. If GitHub allows it, we can certainly proceed that way. Otherwise, we can fall back to merging via CLI. |
|
Just for reference: I’ve organized the CLI steps in this wiki page, which we can follow if we need to merge via CLI: |
Take it easy. It's in our expectations. Feel free to try the CLI method yourself. I'm happy to help. By the way, I did a dry run on my local machine, and it works. If you have any questions, please let me know. |
|
cp_rel_2_stable_from_main.txt |


cherry-pick from ffe370b to 41c1750