Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
2c556cf to
7f84e0a
Compare
|
🤖 Hi @ChingTsai, I've received your request, and I'm working on it now! You can track my progress in the logs for more details. |
There was a problem hiding this comment.
📋 Review Summary
This Pull Request introduces the implementation of Qwen2 models, including new decoder layers, weight mappings, and configuration updates. The changes integrate Qwen2 into the existing MaxText framework, extending its model compatibility.
🔍 General Feedback
- The generalization of Qwen3 mappings and hook functions to a unified Qwen approach in
hf_shape.pyandparam_mapping.pyis a good practice, improving code reusability and maintainability. - New configuration files for Qwen2.5 models are well-structured and consistent with existing model configurations.
- Ensure consistent handling of attention biases across the model definition and weight mapping to prevent potential runtime issues.
|
Thanks for bringing up new models! We usually verify implementation using this script against the HF version. Please let us know if you meet any issues. |
|
cc @parambole who is working on Qwen3 for helping review PRs |
dcc4282 to
88f6034
Compare
Hi @RissyRan, I noticed that the 7b scanned checkpoint has a higher max KL divergence of 0.016245 (see logs). I've updated the threshold (0.015 -> 0.017) to allow this to pass, but please let me know if this level of divergence is a concern. |
baac10d to
ccdb2ea
Compare
8d0d39d to
890d7c1
Compare
RissyRan
left a comment
There was a problem hiding this comment.
Thanks for adding new models and tests! Overall LGTM!
As checkpoint conversion is bi-directional here, could you try convert your orbax to huggingface and compare the original huggingface checkpoint and see if value matches? One reference.
@hengtaoguo could you also take a review, especially for weight transfer part?
| @@ -0,0 +1,38 @@ | |||
| # Copyright 2023–2025 Google LLC | |||
There was a problem hiding this comment.
nit: 2026, similar for other files if applies.
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| # model config for qwen2.5-14b |
There was a problem hiding this comment.
Nit: could you help add a huggingface reference link for them, like https://huggingface.co/Qwen/Qwen2.5-14B-Instruct/blob/main/config.json. Similar for other files if applies.
| vocab_size: 152064 | ||
|
|
||
| decoder_block: "qwen2" | ||
|
|
There was a problem hiding this comment.
nit: shall we remove the extra lines to align with 7b if no specific comment?
|
|
||
| ## Running the End-to-End Test | ||
|
|
||
| The `test_qwen2.5-14b.sh` script automates the following steps: |
There was a problem hiding this comment.
Shall we put 7b here (swap it in this doc)? I see HF downloads from 7b is more popular with 19M last month. https://screenshot.googleplex.com/56Gg4wPjEWEMMsw
| --run_hf_model=True \ | ||
| --hf_model_path=${HF_MODEL_ID} | ||
|
|
||
| # Step 3: SFT |
There was a problem hiding this comment.
Shall we add one test for pre-train (without loading ckpt) and decode (with unscanned ckpt)? I think 7b and 14b are same structure, and it's fine to just add one into end_to_end test script.
51f8989 to
4432d31
Compare
Hi @RissyRan , I have completed the HF conversion checks for qwen2.5-7B and qwen2.5-14B. Thanks! |
| bash tests/end_to_end/tpu/qwen/dense/qwen2.5-7b/test_qwen2.5-7b.sh | ||
| ``` | ||
|
|
||
| #### Expected output |
There was a problem hiding this comment.
Let's remove this section? The alignment could be changed based on precision an other configs. I don't want to give a wrong impression to customers that rank_agreement_percentage is at 40.0%. Similar to performance number in pre-train.
RissyRan
left a comment
There was a problem hiding this comment.
LGTM! Please consider to remove the expected results.
4432d31 to
def7166
Compare
def7166 to
d6c9842
Compare
Description
Changes
Fix b/471703114
Tests
Logit Verification for HF -> MT
MT -> HF Checkpoint Check
Checklist
Before submitting this PR, please make sure (put X in square brackets):
gemini-reviewlabel.