Skip to content

Conversation

@veeceey
Copy link

@veeceey veeceey commented Feb 11, 2026

Pulling the same HuggingFace model at different times produces different OCI digests because time.Now() is used as the creation timestamp in the config. This makes content-addressability unreliable since identical model content yields different manifests.

Changes

  • Builder: Add optional WithCreated(time.Time) build option to FromPath/FromPaths so callers can supply a deterministic timestamp. Uses time.Now() as fallback when not specified, keeping backward compatibility via variadic params.
  • FromDirectory: Add WithCreatedTime(time.Time) directory option for the same purpose.
  • HuggingFace client: Add GetRepoInfo() to fetch repo metadata including lastModified.
  • HuggingFace model builder: Fetch lastModified from the HF API after downloading files and pass it as the creation timestamp. Falls back gracefully to time.Now() if the API call fails.

Test plan

  • New tests verify same file + same timestamp = identical digest
  • New tests verify different timestamps = different digests
  • New tests for GetRepoInfo (success, default revision, 404)
  • All existing tests pass (go test ./pkg/distribution/...)

Fixes #647

Replace time.Now() with deterministic timestamps in the builder and
HuggingFace model packages so that pulling the same model revision
always produces the same OCI digest.

For HuggingFace models, the repo's lastModified date is fetched via
the API and used as the creation timestamp. For the builder APIs,
new optional WithCreated/WithCreatedTime options allow callers to
supply a fixed timestamp.

Fixes docker#647
@veeceey
Copy link
Author

veeceey commented Feb 11, 2026

Manual test results

All distribution tests pass:

$ go test ./pkg/distribution/... -count=1
ok  	github.com/docker/model-runner/pkg/distribution/builder           0.190s
ok  	github.com/docker/model-runner/pkg/distribution/distribution      4.204s
ok  	github.com/docker/model-runner/pkg/distribution/files             0.353s
ok  	github.com/docker/model-runner/pkg/distribution/format            1.189s
ok  	github.com/docker/model-runner/pkg/distribution/huggingface       0.797s
ok  	github.com/docker/model-runner/pkg/distribution/internal/bundle   0.587s
ok  	github.com/docker/model-runner/pkg/distribution/internal/gguf     1.412s
ok  	github.com/docker/model-runner/pkg/distribution/internal/mutate   0.991s
ok  	github.com/docker/model-runner/pkg/distribution/internal/partial  2.453s
ok  	github.com/docker/model-runner/pkg/distribution/internal/progress 0.868s
ok  	github.com/docker/model-runner/pkg/distribution/internal/safetensors 2.659s
ok  	github.com/docker/model-runner/pkg/distribution/internal/store    2.930s
ok  	github.com/docker/model-runner/pkg/distribution/modelpack         1.754s
ok  	github.com/docker/model-runner/pkg/distribution/oci/authn         1.572s
ok  	github.com/docker/model-runner/pkg/distribution/packaging         1.990s
ok  	github.com/docker/model-runner/pkg/distribution/registry          2.601s
ok  	github.com/docker/model-runner/pkg/distribution/tarball           2.596s

New test details:

  • TestWithCreatedDeterministicDigest: builds the same GGUF file twice with an identical fixed timestamp, verifies digests match; then builds with a different timestamp and verifies the digest changes.
  • TestWithCreatedFromPaths: same as above but through FromPaths.
  • TestFromDirectoryWithCreatedTime: verifies WithCreatedTime produces deterministic digests from FromDirectory.
  • TestClientGetRepoInfo: mock server returns lastModified JSON, verifies parsed timestamp.
  • TestClientGetRepoInfoDefaultRevision: verifies empty revision defaults to "main".
  • TestClientGetRepoInfoNotFound: verifies NotFoundError on 404.

Backward compatibility: FromPath, FromPaths, and FromDirectory all use variadic optional params, so existing callers are unaffected.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @veeceey, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves the problem of non-deterministic OCI digests when pulling HuggingFace models. By incorporating the lastModified timestamp from the HuggingFace API and providing options for setting a deterministic timestamp during model building, this PR ensures content-addressability and improves the reliability of model distribution.

Highlights

  • Deterministic OCI Digests: This PR addresses the issue of non-deterministic OCI digests caused by using time.Now() as the creation timestamp, ensuring that identical model content always yields the same artifact.
  • Build Options: Introduces WithCreated(time.Time) build option to FromPath/FromPaths and WithCreatedTime(time.Time) directory option to allow callers to supply a deterministic timestamp.
  • HuggingFace Integration: Adds GetRepoInfo() to fetch repo metadata, including lastModified, from the HuggingFace API and uses it as the creation timestamp for improved content-addressability.
  • Testing: Includes new tests to verify that the same file + same timestamp = identical digest, different timestamps = different digests, and tests for GetRepoInfo.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Activity
  • Added WithCreated(time.Time) build option to FromPath/FromPaths.
  • Added WithCreatedTime(time.Time) directory option.
  • Added GetRepoInfo() to fetch repo metadata from the HuggingFace API.
  • Modified HuggingFace model builder to fetch lastModified and use it as the creation timestamp.
  • Added new tests to verify deterministic digests and GetRepoInfo functionality.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In GetRepoInfo, you rely on the default time.Time JSON decoding for lastModified; consider unmarshalling into a string and parsing with an explicit layout (matching HuggingFace’s format) so you have a controlled failure mode if their timestamp format changes.
  • The logic for selecting the created timestamp is now duplicated in fromFormat and FromDirectory; consider factoring this into a small helper or reusing the same options type so future changes to creation-time handling stay consistent.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `GetRepoInfo`, you rely on the default `time.Time` JSON decoding for `lastModified`; consider unmarshalling into a string and parsing with an explicit layout (matching HuggingFace’s format) so you have a controlled failure mode if their timestamp format changes.
- The logic for selecting the `created` timestamp is now duplicated in `fromFormat` and `FromDirectory`; consider factoring this into a small helper or reusing the same options type so future changes to creation-time handling stay consistent.

## Individual Comments

### Comment 1
<location> `pkg/distribution/huggingface/client.go:186-191` </location>
<code_context>
+// GetRepoInfo fetches repository metadata from the HuggingFace API.
+// This returns information such as the last modified timestamp, which is useful
+// for producing deterministic OCI digests.
+func (c *Client) GetRepoInfo(ctx context.Context, repo, revision string) (*RepoInfo, error) {
+	if revision == "" {
+		revision = "main"
+	}
+
+	url := fmt.Sprintf("%s/api/models/%s/revision/%s", c.baseURL, repo, revision)
+
+	req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, http.NoBody)
</code_context>

<issue_to_address>
**issue (bug_risk):** Revision (and possibly repo) should be URL-escaped to handle slashes and special characters correctly.

Since `revision` is interpolated directly into the path, values like `refs/pr/1` will be interpreted as extra path segments instead of a single revision. Percent-encode the revision (and likely `repo`) using `url.URL`/`path.Join` and `url.PathEscape`, consistent with other endpoints, so repos/revisions containing slashes or special characters work correctly.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +186 to +191
func (c *Client) GetRepoInfo(ctx context.Context, repo, revision string) (*RepoInfo, error) {
if revision == "" {
revision = "main"
}

url := fmt.Sprintf("%s/api/models/%s/revision/%s", c.baseURL, repo, revision)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Revision (and possibly repo) should be URL-escaped to handle slashes and special characters correctly.

Since revision is interpolated directly into the path, values like refs/pr/1 will be interpreted as extra path segments instead of a single revision. Percent-encode the revision (and likely repo) using url.URL/path.Join and url.PathEscape, consistent with other endpoints, so repos/revisions containing slashes or special characters work correctly.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the issue of non-deterministic OCI digests by introducing a mechanism to provide a fixed creation timestamp. The implementation uses the functional options pattern cleanly in the builder, and gracefully falls back to the old behavior if a timestamp cannot be fetched from the HuggingFace API. The changes are well-tested. I have a couple of minor suggestions to improve consistency and observability.

Comment on lines +53 to +60
// WithCreatedTime sets a specific creation timestamp for the model artifact
// built from a directory. When not set, the current time (time.Now()) is used.
// This is useful for producing deterministic OCI digests when the same directory
// content should always yield the same artifact regardless of when it was built.
func WithCreatedTime(t time.Time) DirectoryOption {
return func(opts *DirectoryOptions) {
opts.Created = &t
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with the BuildOption defined in pkg/distribution/builder/builder.go, it would be better to name this function WithCreated instead of WithCreatedTime. Both functions serve the same purpose of setting a creation timestamp, and using the same name improves code clarity and maintainability. You will need to update the call sites in pkg/distribution/builder/from_directory_test.go as well.

Suggested change
// WithCreatedTime sets a specific creation timestamp for the model artifact
// built from a directory. When not set, the current time (time.Now()) is used.
// This is useful for producing deterministic OCI digests when the same directory
// content should always yield the same artifact regardless of when it was built.
func WithCreatedTime(t time.Time) DirectoryOption {
return func(opts *DirectoryOptions) {
opts.Created = &t
}
// WithCreated sets a specific creation timestamp for the model artifact
// built from a directory. When not set, the current time (time.Now()) is used.
// This is useful for producing deterministic OCI digests when the same directory
// content should always yield the same artifact regardless of when it was built.
func WithCreated(t time.Time) DirectoryOption {
return func(opts *DirectoryOptions) {
opts.Created = &t
}
}

Comment on lines +81 to +85
var createdTime *time.Time
repoInfo, err := client.GetRepoInfo(ctx, repo, revision)
if err == nil && !repoInfo.LastModified.IsZero() {
createdTime = &repoInfo.LastModified
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error from client.GetRepoInfo is silently ignored. While falling back to time.Now() is the correct behavior, logging this error at a warning level would improve observability. This would help users understand why they might not be getting deterministic OCI digests if they were expecting them.

Suggested change
var createdTime *time.Time
repoInfo, err := client.GetRepoInfo(ctx, repo, revision)
if err == nil && !repoInfo.LastModified.IsZero() {
createdTime = &repoInfo.LastModified
}
var createdTime *time.Time
repoInfo, err := client.GetRepoInfo(ctx, repo, revision)
if err != nil {
log.Printf("Warning: could not fetch repo info for deterministic timestamp: %v. Falling back to current time.", err)
} else if !repoInfo.LastModified.IsZero() {
createdTime = &repoInfo.LastModified
}

@ericcurtin
Copy link
Contributor

This looks ok, so far. Could you go through the Gemini comments @veeceey ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hugging Face models produce non-deterministic OCI digests due to time.Now() in creation date

2 participants