Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
5b87d81
confluence-mdx: R2 — SyncProfile 정의 및 Config에 sync_code 필드를 추가합니다
jk-kim0 Mar 18, 2026
27aa2d2
confluence-mdx: R1 — pages.yaml 파일명을 Space 코드 기반으로 분리합니다
jk-kim0 Mar 18, 2026
9ffa5ff
confluence-mdx: R4 — list.txt 생성 기능을 제거합니다
jk-kim0 Mar 18, 2026
ae1c307
confluence-mdx: R5 — image_status.py의 fetch_state 리포트를 Space별로 분리합니다
jk-kim0 Mar 18, 2026
e2056d9
confluence-mdx: R6 — Confluence folder content type을 페이지 탐색에서 올바르게 처리합니다
jk-kim0 Mar 18, 2026
40622a2
confluence-mdx: entrypoint.sh과 compose.yml을 멀티 Space 지원으로 업데이트합니다
jk-kim0 Mar 18, 2026
cdcd014
confluence-mdx: var/pages.yaml → var/pages.qm.yaml 마이그레이션합니다
jk-kim0 Mar 18, 2026
e47ee03
confluence-mdx: QCP Space 루트 페이지 ID를 기록합니다
jk-kim0 Mar 18, 2026
1ca25b0
confluence-mdx: reverse_sync_cli.py의 pages.yaml 참조를 pages.qm.yaml로 업데…
jk-kim0 Mar 19, 2026
1f293bc
confluence-mdx: run-tests.sh의 pages.yaml 참조를 pages.qm.yaml로 업데이트합니다
jk-kim0 Mar 19, 2026
9fd3b42
confluence-mdx: converter/cli.py의 pages.yaml 탐색에 pages.qm.yaml fallba…
jk-kim0 Mar 19, 2026
f20190f
chore: .codegraph/, confluence-mdx/docs/superpowers/, confluence-mdx/…
jk-kim0 Mar 19, 2026
9712421
chore: confluence-mdx/.gitignore 에서 개인 로컬 항목을 제거합니다
jk-kim0 Mar 19, 2026
c9a4bf0
confluence-mdx: folder 루트 페이지의 첫 fetch 실패를 수정합니다
jk-kim0 Mar 19, 2026
2af1dbb
confluence-mdx: converter subprocess에 pages-yaml 경로를 전달합니다
jk-kim0 Mar 19, 2026
da8535e
confluence-mdx: compose.yml에 pages.qcp.yaml bind mount를 추가합니다
jk-kim0 Mar 19, 2026
3ef51f2
confluence-mdx: unused_attachments.py, link_resolver.py를 sync_profile…
jk-kim0 Mar 19, 2026
1c081f9
confluence-mdx: find_mdx_with_text.py와 README.md를 pages.<code>.yaml 규…
jk-kim0 Mar 19, 2026
d93702f
confluence-mdx: find_mdx_with_text.py Confluence 링크를 sync_code의 space…
jk-kim0 Mar 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,6 @@ next-env.d.ts

# git worktrees
.worktrees/

# CodeGraph semantic index (auto-generated)
/.codegraph/
1 change: 0 additions & 1 deletion confluence-mdx/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,4 @@
/bin/mdx_to_storage/__pycache__/
/tests/__pycache__/
/tests/test_mdx_to_storage/__pycache__/
/var/list.txt
/reports/
18 changes: 8 additions & 10 deletions confluence-mdx/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,10 +80,10 @@ pip3 install requests beautifulsoup4 pyyaml

1. `confluence-mdx/var/`에 Confluence 문서 데이터를 저장합니다.
- 개별 문서마다 `<page_id>/page.xhtml`, `<page_id>/page.v1.yaml` 등을 저장합니다.
- 전체 문서 목록을 `var/pages.yaml`에 저장합니다.
- 전체 문서 목록을 `var/pages.<code>.yaml`에 저장합니다 (예: `var/pages.qm.yaml`).
- `fetch_cli.py`를 사용합니다.
2. `src/content/ko/` 아래에 MDX 문서를 생성합니다.
- `var/pages.yaml`을 기반으로 모든 페이지를 변환합니다.
- `var/pages.<code>.yaml`을 기반으로 모든 페이지를 변환합니다.
- `convert_all.py`를 사용합니다.

무작정 따라해 보기
Expand Down Expand Up @@ -130,8 +130,7 @@ bin/fetch_cli.py --attachments
bin/fetch_cli.py --local

# 로컬에서 fetch_cli.py 개선 과정에서, 반복실행할 때 사용하는 명령입니다.
# 또는, var/list.txt 를 업데이트하고자 하는 경우에 실행합니다.
bin/fetch_cli.py --local >var/list.txt
bin/fetch_cli.py --local

# 특정 페이지 ID와 하위 문서를 내려받습니다. 첨부파일을 포함하여 내려받습니다.
# 일부 문서만 변경한 경우, 해당 문서와 하위 페이지를 API 로 내려받아 저장할 때 사용합니다.
Expand All @@ -156,23 +155,22 @@ bin/fetch_cli.py --log-level DEBUG
실행 결과:
- `var/` 디렉토리에 문서 데이터가 저장됩니다.
- 각 페이지 ID에 해당하는 디렉토리에 `page.yaml`과 `page.xhtml` 파일이 저장됩니다.
- `>list.txt`로 stdout 을 redirect 하면, `list.txt` 파일에 문서 목록이 저장됩니다.

### 2. 전체 변환 (convert_all.py)

`convert_all.py`는 `var/pages.yaml`을 기반으로 모든 페이지를 MDX로 변환하는 스크립트입니다.
`convert_all.py`는 `var/pages.<code>.yaml`을 기반으로 모든 페이지를 MDX로 변환하는 스크립트입니다.
변환 전에 번역 누락을 자동 검증합니다.

실행 방법:
```bash
# 전체 변환 (번역 검증 포함)
# 전체 변환 (번역 검증 포함, 기본: --sync-code qm)
bin/convert_all.py

# QCP Space 변환
bin/convert_all.py --sync-code qcp

# 번역 검증만 수행 (변환하지 않음)
bin/convert_all.py --verify-translations

# 디버깅용 list.txt / list.en.txt 생성 (변환도 함께 수행)
bin/convert_all.py --generate-list
```

실행 결과:
Expand Down
55 changes: 15 additions & 40 deletions confluence-mdx/bin/convert_all.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
하나의 명령으로 대체합니다.

Usage:
bin/convert_all.py # 전체 변환
bin/convert_all.py # 전체 변환 (기본: --sync-code qm)
bin/convert_all.py --sync-code qcp # QCP Space 변환
bin/convert_all.py --verify-translations # 번역 검증만 수행
bin/convert_all.py --generate-list # list.txt / list.en.txt 생성
"""

import argparse
Expand Down Expand Up @@ -77,36 +77,8 @@ def verify_translations(pages: List[Dict], translations: Dict[str, str]) -> List
return missing


def generate_list_files(pages: List[Dict], output_dir: str) -> None:
"""Generate list.txt (Korean) and list.en.txt (English) from pages.yaml."""
list_txt_lines = []
list_en_lines = []

# Skip the root page (first entry, single breadcrumb)
root_page_id = pages[0]['page_id'] if pages else None

for page in pages:
if page['page_id'] == root_page_id:
continue
breadcrumbs = page.get('breadcrumbs', [])
breadcrumbs_en = page.get('breadcrumbs_en', [])
list_txt_lines.append(f"{page['page_id']}\t{' />> '.join(breadcrumbs)}\n")
list_en_lines.append(f"{page['page_id']}\t{' />> '.join(breadcrumbs_en)}\n")

list_txt_path = os.path.join(output_dir, 'list.txt')
list_en_path = os.path.join(output_dir, 'list.en.txt')

with open(list_txt_path, 'w', encoding='utf-8') as f:
f.writelines(list_txt_lines)
print(f"Generated {list_txt_path} ({len(list_txt_lines)} entries)", file=sys.stderr)

with open(list_en_path, 'w', encoding='utf-8') as f:
f.writelines(list_en_lines)
print(f"Generated {list_en_path} ({len(list_en_lines)} entries)", file=sys.stderr)


def convert_all(pages: List[Dict], var_dir: str, output_base_dir: str, public_dir: str,
log_level: str) -> int:
log_level: str, pages_yaml: str = '') -> int:
"""Run converter/cli.py for each page. Returns number of failures."""
# Skip the root page
root_page_id = pages[0]['page_id'] if pages else None
Expand Down Expand Up @@ -148,6 +120,8 @@ def convert_all(pages: List[Dict], var_dir: str, output_base_dir: str, public_di
f'--attachment-dir={attachment_dir}',
f'--log-level={log_level}',
]
if pages_yaml:
cmd.append(f'--pages-yaml={pages_yaml}')

print(f"[{i}/{total}] {page_id} → {output_file}", file=sys.stderr)
result = subprocess.run(cmd, capture_output=True, text=True)
Expand All @@ -162,8 +136,10 @@ def main():
parser = argparse.ArgumentParser(
description='Batch convert all Confluence pages to MDX using pages.yaml'
)
parser.add_argument('--pages-yaml', default='var/pages.yaml',
help='Path to pages.yaml (default: var/pages.yaml)')
parser.add_argument('--sync-code', default='qm',
help='Sync profile code; used to auto-derive --pages-yaml (default: %(default)s)')
parser.add_argument('--pages-yaml', default=None,
help='Path to pages YAML (default: var/pages.<sync-code>.yaml)')
parser.add_argument('--var-dir', default='var',
help='Directory containing page data (default: var)')
parser.add_argument('--output-dir', default='target/ko',
Expand All @@ -174,13 +150,15 @@ def main():
help='Path to translations file')
parser.add_argument('--verify-translations', action='store_true',
help='Verify translation coverage and exit')
parser.add_argument('--generate-list', action='store_true',
help='Generate list.txt / list.en.txt for debugging')
parser.add_argument('--log-level', default='warning',
choices=['debug', 'info', 'warning', 'error', 'critical'],
help='Log level for converter/cli.py (default: warning)')
args = parser.parse_args()

# Auto-derive pages-yaml from sync-code if not explicitly provided
if args.pages_yaml is None:
args.pages_yaml = f'var/pages.{args.sync_code}.yaml'

# Resolve relative paths against project root (confluence-mdx/)
args.pages_yaml = _resolve(args.pages_yaml)
args.var_dir = _resolve(args.var_dir)
Expand Down Expand Up @@ -208,12 +186,9 @@ def main():
if args.verify_translations:
sys.exit(0)

# --generate-list: generate list files
if args.generate_list:
generate_list_files(pages, args.var_dir)

# Run conversions
failures = convert_all(pages, args.var_dir, args.output_dir, args.public_dir, args.log_level)
failures = convert_all(pages, args.var_dir, args.output_dir, args.public_dir, args.log_level,
pages_yaml=args.pages_yaml)

if failures:
print(f"\nCompleted with {failures} failure(s) out of {len(pages)} pages", file=sys.stderr)
Expand Down
13 changes: 11 additions & 2 deletions confluence-mdx/bin/converter/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,8 @@ def main():
parser.add_argument('--language',
choices=['ko', 'ja', 'en'],
help='언어 코드를 명시적으로 지정 (미지정 시 출력 경로에서 자동 감지)')
parser.add_argument('--pages-yaml',
help='pages.<code>.yaml 경로 (미지정 시 input_dir/../pages.qm.yaml → pages.yaml 순으로 탐색)')
parser.add_argument('--page-dir',
help='page.v1.yaml 등 페이지 데이터 디렉토리 (기본: input 파일의 디렉토리)')
parser.add_argument('--log-level',
Expand Down Expand Up @@ -176,8 +178,15 @@ def main():
# 원본 XHTML 보존 — sidecar mapping에서 사용
xhtml_original = html_content

# Load pages.yaml to get the current page's path
pages_yaml_path = os.path.join(input_dir, '..', 'pages.yaml')
# Load pages YAML for internal link resolution and _meta.ts generation.
# Priority: --pages-yaml arg > pages.qm.yaml (new naming) > pages.yaml (legacy).
var_dir = os.path.join(input_dir, '..')
if args.pages_yaml:
pages_yaml_path = args.pages_yaml
else:
pages_yaml_path = os.path.join(var_dir, 'pages.qm.yaml')
if not os.path.exists(pages_yaml_path):
pages_yaml_path = os.path.join(var_dir, 'pages.yaml')
load_pages_yaml(pages_yaml_path, PAGES_BY_TITLE, PAGES_BY_ID)

# Load page.v1.yaml: --page-dir 우선, 없으면 input_dir에서 탐색
Expand Down
26 changes: 20 additions & 6 deletions confluence-mdx/bin/fetch/api_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,14 +53,28 @@ def get_page_data_v1(self, page_id: str) -> Optional[Dict]:
url = f"{self.config.base_url}/rest/api/content/{page_id}?expand=title,ancestors,body.storage,body.view"
return self.make_request(url, "V1 API page data")

def get_page_data_v2(self, page_id: str) -> Optional[Dict]:
"""Get page data using V2 API"""
url = f"{self.config.base_url}/api/v2/pages/{page_id}?body-format=atlas_doc_format"
def get_page_data_v2(self, page_id: str, content_type: str = "page") -> Optional[Dict]:
"""Get page data using V2 API.

Uses /api/v2/folders/{id} for folder content type, /api/v2/pages/{id} otherwise.
"""
if content_type == "folder":
url = f"{self.config.base_url}/api/v2/folders/{page_id}"
else:
url = f"{self.config.base_url}/api/v2/pages/{page_id}?body-format=atlas_doc_format"
return self.make_request(url, "V2 API page data")

def get_child_pages(self, page_id: str) -> Optional[Dict]:
"""Get child pages using V2 API"""
url = f"{self.config.base_url}/api/v2/pages/{page_id}/children?type=page&limit=100"
def get_child_pages(self, page_id: str, content_type: str = "page") -> Optional[Dict]:
"""Get child pages using V2 API.

Uses /api/v2/folders/{id}/children for folder content type,
/api/v2/pages/{id}/children for page content type.
The type=page filter is omitted so that folder children are also included.
"""
if content_type == "folder":
url = f"{self.config.base_url}/api/v2/folders/{page_id}/children?limit=100"
else:
url = f"{self.config.base_url}/api/v2/pages/{page_id}/children?limit=100"
return self.make_request(url, "V2 API child pages")

def get_attachments(self, page_id: str) -> Optional[Dict]:
Expand Down
13 changes: 13 additions & 0 deletions confluence-mdx/bin/fetch/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,16 @@ class Config:
"""Centralized configuration management"""
base_url: str = "https://querypie.atlassian.net/wiki"
space_key: str = "QM" # Confluence space key
sync_code: str = "qm" # Sync profile code (see fetch/sync_profiles.py)
days: Optional[int] = None # Number of days to look back (None = auto-detect from .fetch_state.yaml)
default_start_page_id: str = "608501837" # Root Page ID of "QueryPie Docs" (for breadcrumbs)
root_content_type: str = "page"
"""Confluence content type of the root page ('page' or 'folder').

Used by Stage 1 when page.v2.yaml does not yet exist (first run on a clean
environment), so the correct API endpoint is selected from the start.
Populated from SyncProfile.root_content_type in fetch_cli.py.
"""
quick_start_page_id: str = "544375784" # QueryPie Overview having less children
default_output_dir: str = "var"
cache_dir: str = "cache"
Expand All @@ -26,6 +34,11 @@ class Config:
download_attachments: bool = False
mode: str = "recent" # Mode: "local", "remote", or "recent"

@property
def pages_yaml_filename(self) -> str:
"""Filename for pages YAML, derived from sync_code."""
return f"pages.{self.sync_code}.yaml"

def __post_init__(self):
if self.email is None:
self.email = os.environ.get('ATLASSIAN_USERNAME', 'your-email@example.com')
Expand Down
33 changes: 4 additions & 29 deletions confluence-mdx/bin/fetch/processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,8 +175,7 @@ def run(self) -> None:
self.logger.info(f"Created output directory: {self.config.default_output_dir}")

# Prepare output file path
output_yaml_path = os.path.join(self.config.default_output_dir, "pages.yaml")
output_list_path = os.path.join(self.config.default_output_dir, "list.txt")
output_yaml_path = os.path.join(self.config.default_output_dir, self.config.pages_yaml_filename)

start_page_id = self.config.default_start_page_id

Expand Down Expand Up @@ -232,9 +231,7 @@ def run(self) -> None:
)

# Download each page through all 4 stages and output to stdout
# Store downloaded pages for list.txt
self.logger.warning(f"Downloading {len(modified_pages)} recently modified pages")
downloaded_list_lines = []
skipped_count = 0
for entry in modified_pages:
page_id = entry["id"]
Expand All @@ -257,8 +254,6 @@ def run(self) -> None:
# Output to stdout during download
breadcrumbs_str = " />> ".join(page.breadcrumbs) if page.breadcrumbs else ""
print(f"{page.page_id}\t{breadcrumbs_str}")
# Store for list.txt (only downloaded pages)
downloaded_list_lines.append(f"{page.page_id}\t{breadcrumbs_str}\n")
except Exception as e:
self.logger.error(f"Error downloading page ID {page_id}: {str(e)}")
continue
Expand All @@ -267,38 +262,25 @@ def run(self) -> None:
self.logger.warning(f"Skipped {skipped_count} pages (already up-to-date)")

# After downloading, process like local mode (hierarchical traversal from start_page_id)
# Generate pages.yaml and list.txt with full hierarchical tree (like --local mode)
# Generate pages.yaml with full hierarchical tree (like --local mode)
# No stdout output in this phase (like --local mode)
self.logger.warning(f"Processing page tree from start page ID {start_page_id} (local mode)")
page_count = 0
yaml_entries = []
list_lines = []

for page in self.fetch_page_tree_recursive(start_page_id, start_page_id, use_local=True):
if page:
breadcrumbs_str = " />> ".join(page.breadcrumbs) if page.breadcrumbs else ""
# No stdout output in local mode
# Exclude start_page_id from list.txt (root page is not converted to MDX)
if page.page_id != start_page_id:
list_lines.append(f"{page.page_id}\t{breadcrumbs_str}\n")
page_count += 1
yaml_entries.append(page.to_dict())

elif self.config.mode == "local":
# --local mode: Process existing local files hierarchically from start_page_id
# No stdout output in local mode
self.logger.warning(f"Local mode: Processing page tree from start page ID {start_page_id}")
page_count = 0
yaml_entries = []
list_lines = []

for page in self.fetch_page_tree_recursive(start_page_id, start_page_id, use_local=True):
if page:
breadcrumbs_str = " />> ".join(page.breadcrumbs) if page.breadcrumbs else ""
# No stdout output in local mode
# Exclude start_page_id from list.txt (root page is not converted to MDX)
if page.page_id != start_page_id:
list_lines.append(f"{page.page_id}\t{breadcrumbs_str}\n")
page_count += 1
yaml_entries.append(page.to_dict())

Expand All @@ -308,15 +290,13 @@ def run(self) -> None:
self.logger.warning(f"Remote mode: Processing page tree from start page ID {start_page_id} via API")
page_count = 0
yaml_entries = []
list_lines = []

for page in self.fetch_page_tree_recursive(start_page_id, start_page_id, use_local=False):
if page:
breadcrumbs_str = " />> ".join(page.breadcrumbs) if page.breadcrumbs else ""
# Exclude start_page_id from stdout and list.txt (root page is not converted to MDX)
# Exclude start_page_id from stdout (root page is not converted to MDX)
if page.page_id != start_page_id:
breadcrumbs_str = " />> ".join(page.breadcrumbs) if page.breadcrumbs else ""
print(f"{page.page_id}\t{breadcrumbs_str}")
list_lines.append(f"{page.page_id}\t{breadcrumbs_str}\n")
page_count += 1
yaml_entries.append(page.to_dict())

Expand Down Expand Up @@ -348,11 +328,6 @@ def run(self) -> None:
self.file_manager.save_yaml(output_yaml_path, yaml_entries)
self.logger.info(f"YAML data saved to {output_yaml_path}")

# Save list.txt file
if list_lines:
self.file_manager.save_file(output_list_path, "".join(list_lines))
self.logger.info(f"List file saved to {output_list_path}")

self.logger.info(f"Completed processing {page_count} pages")
except Exception as e:
self.logger.error(f"Error in main execution: {str(e)}")
Expand Down
Loading
Loading