-
Notifications
You must be signed in to change notification settings - Fork 94
Fix non-deterministic OCI digests from time.Now() in creation timestamp #654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix non-deterministic OCI digests from time.Now() in creation timestamp #654
Conversation
Replace time.Now() with deterministic timestamps in the builder and HuggingFace model packages so that pulling the same model revision always produces the same OCI digest. For HuggingFace models, the repo's lastModified date is fetched via the API and used as the creation timestamp. For the builder APIs, new optional WithCreated/WithCreatedTime options allow callers to supply a fixed timestamp. Fixes docker#647
Manual test resultsAll distribution tests pass: New test details:
Backward compatibility: |
Summary of ChangesHello @veeceey, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves the problem of non-deterministic OCI digests when pulling HuggingFace models. By incorporating the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 1 issue, and left some high level feedback:
- In
GetRepoInfo, you rely on the defaulttime.TimeJSON decoding forlastModified; consider unmarshalling into a string and parsing with an explicit layout (matching HuggingFace’s format) so you have a controlled failure mode if their timestamp format changes. - The logic for selecting the
createdtimestamp is now duplicated infromFormatandFromDirectory; consider factoring this into a small helper or reusing the same options type so future changes to creation-time handling stay consistent.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `GetRepoInfo`, you rely on the default `time.Time` JSON decoding for `lastModified`; consider unmarshalling into a string and parsing with an explicit layout (matching HuggingFace’s format) so you have a controlled failure mode if their timestamp format changes.
- The logic for selecting the `created` timestamp is now duplicated in `fromFormat` and `FromDirectory`; consider factoring this into a small helper or reusing the same options type so future changes to creation-time handling stay consistent.
## Individual Comments
### Comment 1
<location> `pkg/distribution/huggingface/client.go:186-191` </location>
<code_context>
+// GetRepoInfo fetches repository metadata from the HuggingFace API.
+// This returns information such as the last modified timestamp, which is useful
+// for producing deterministic OCI digests.
+func (c *Client) GetRepoInfo(ctx context.Context, repo, revision string) (*RepoInfo, error) {
+ if revision == "" {
+ revision = "main"
+ }
+
+ url := fmt.Sprintf("%s/api/models/%s/revision/%s", c.baseURL, repo, revision)
+
+ req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, http.NoBody)
</code_context>
<issue_to_address>
**issue (bug_risk):** Revision (and possibly repo) should be URL-escaped to handle slashes and special characters correctly.
Since `revision` is interpolated directly into the path, values like `refs/pr/1` will be interpreted as extra path segments instead of a single revision. Percent-encode the revision (and likely `repo`) using `url.URL`/`path.Join` and `url.PathEscape`, consistent with other endpoints, so repos/revisions containing slashes or special characters work correctly.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| func (c *Client) GetRepoInfo(ctx context.Context, repo, revision string) (*RepoInfo, error) { | ||
| if revision == "" { | ||
| revision = "main" | ||
| } | ||
|
|
||
| url := fmt.Sprintf("%s/api/models/%s/revision/%s", c.baseURL, repo, revision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Revision (and possibly repo) should be URL-escaped to handle slashes and special characters correctly.
Since revision is interpolated directly into the path, values like refs/pr/1 will be interpreted as extra path segments instead of a single revision. Percent-encode the revision (and likely repo) using url.URL/path.Join and url.PathEscape, consistent with other endpoints, so repos/revisions containing slashes or special characters work correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively addresses the issue of non-deterministic OCI digests by introducing a mechanism to provide a fixed creation timestamp. The implementation uses the functional options pattern cleanly in the builder, and gracefully falls back to the old behavior if a timestamp cannot be fetched from the HuggingFace API. The changes are well-tested. I have a couple of minor suggestions to improve consistency and observability.
| // WithCreatedTime sets a specific creation timestamp for the model artifact | ||
| // built from a directory. When not set, the current time (time.Now()) is used. | ||
| // This is useful for producing deterministic OCI digests when the same directory | ||
| // content should always yield the same artifact regardless of when it was built. | ||
| func WithCreatedTime(t time.Time) DirectoryOption { | ||
| return func(opts *DirectoryOptions) { | ||
| opts.Created = &t | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with the BuildOption defined in pkg/distribution/builder/builder.go, it would be better to name this function WithCreated instead of WithCreatedTime. Both functions serve the same purpose of setting a creation timestamp, and using the same name improves code clarity and maintainability. You will need to update the call sites in pkg/distribution/builder/from_directory_test.go as well.
| // WithCreatedTime sets a specific creation timestamp for the model artifact | |
| // built from a directory. When not set, the current time (time.Now()) is used. | |
| // This is useful for producing deterministic OCI digests when the same directory | |
| // content should always yield the same artifact regardless of when it was built. | |
| func WithCreatedTime(t time.Time) DirectoryOption { | |
| return func(opts *DirectoryOptions) { | |
| opts.Created = &t | |
| } | |
| // WithCreated sets a specific creation timestamp for the model artifact | |
| // built from a directory. When not set, the current time (time.Now()) is used. | |
| // This is useful for producing deterministic OCI digests when the same directory | |
| // content should always yield the same artifact regardless of when it was built. | |
| func WithCreated(t time.Time) DirectoryOption { | |
| return func(opts *DirectoryOptions) { | |
| opts.Created = &t | |
| } | |
| } |
| var createdTime *time.Time | ||
| repoInfo, err := client.GetRepoInfo(ctx, repo, revision) | ||
| if err == nil && !repoInfo.LastModified.IsZero() { | ||
| createdTime = &repoInfo.LastModified | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error from client.GetRepoInfo is silently ignored. While falling back to time.Now() is the correct behavior, logging this error at a warning level would improve observability. This would help users understand why they might not be getting deterministic OCI digests if they were expecting them.
| var createdTime *time.Time | |
| repoInfo, err := client.GetRepoInfo(ctx, repo, revision) | |
| if err == nil && !repoInfo.LastModified.IsZero() { | |
| createdTime = &repoInfo.LastModified | |
| } | |
| var createdTime *time.Time | |
| repoInfo, err := client.GetRepoInfo(ctx, repo, revision) | |
| if err != nil { | |
| log.Printf("Warning: could not fetch repo info for deterministic timestamp: %v. Falling back to current time.", err) | |
| } else if !repoInfo.LastModified.IsZero() { | |
| createdTime = &repoInfo.LastModified | |
| } |
|
This looks ok, so far. Could you go through the Gemini comments @veeceey ? |
Pulling the same HuggingFace model at different times produces different OCI digests because
time.Now()is used as the creation timestamp in the config. This makes content-addressability unreliable since identical model content yields different manifests.Changes
WithCreated(time.Time)build option toFromPath/FromPathsso callers can supply a deterministic timestamp. Usestime.Now()as fallback when not specified, keeping backward compatibility via variadic params.WithCreatedTime(time.Time)directory option for the same purpose.GetRepoInfo()to fetch repo metadata includinglastModified.lastModifiedfrom the HF API after downloading files and pass it as the creation timestamp. Falls back gracefully totime.Now()if the API call fails.Test plan
GetRepoInfo(success, default revision, 404)go test ./pkg/distribution/...)Fixes #647