Skip to content

feat(arrow/extensions): add support for geoarrow.point#545

Open
Mandukhai-Alimaa wants to merge 1 commit intoapache:mainfrom
Mandukhai-Alimaa:feat/implement-geoarrow
Open

feat(arrow/extensions): add support for geoarrow.point#545
Mandukhai-Alimaa wants to merge 1 commit intoapache:mainfrom
Mandukhai-Alimaa:feat/implement-geoarrow

Conversation

@Mandukhai-Alimaa
Copy link
Contributor

Rationale for this change

There is no support for geoarrow in go

What changes are included in this PR?

Add support for geoarrow.point and register it as extension type and add tests for the geoarrow.point

Are these changes tested?

Yes, the newly added test and all the other tests are passing.

Are there any user-facing changes?

Users will be able to use geoarrow.point as an extension type.

@Mandukhai-Alimaa
Copy link
Contributor Author

This is part of solving #504. On the last PR to add support for the final geoarrow data type will add the closes tag and close the issue

Comment on lines +63 to +65
func (p Point) IsEmpty() bool {
return math.IsNaN(p.X) && math.IsNaN(p.Y) && math.IsNaN(p.Z) && math.IsNaN(p.M)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment says "NaN or zero" but you're only checking for NaN here

Comment on lines +66 to +71
// GeometryEncoding represents the encoding method for geometry data
type GeometryEncoding int

const (
EncodingGeoArrow GeometryEncoding = iota
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would be the alternate to this that would be the other constants?

Comment on lines +111 to +112
// Encoding specifies the geometry encoding format
Encoding GeometryEncoding `json:"encoding,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this in the spec, where is this member coming from?

Comment on lines +114 to +115
// CRS contains PROJJSON coordinate reference system information
CRS json.RawMessage `json:"crs,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it should be PROJJSON it could also be other formats

}

// GeometryMetadata contains metadata for GeoArrow geometry types
type GeometryMetadata struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're missing the optional crs_type member which should be one of

  • "projjson"
  • "wkt2:2019"
  • "authority_code"
  • "srid"

// Value returns the Point at the given index
func (p *PointArray) Value(i int) Point {
pointType := p.ExtensionType().(*PointType)
structArray := p.Storage().(*array.Struct)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to determine whether it is interleaved or separate before you assume it's a struct

point := Point{Dimension: pointType.dimension}

if p.IsNull(i) {
return point
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use NewEmptyPoint?

Comment on lines +295 to +307
switch point.Dimension {
case DimensionXY:
return []float64{point.X, point.Y}
case DimensionXYZ:
return []float64{point.X, point.Y, point.Z}
case DimensionXYM:
return []float64{point.X, point.Y, point.M}
case DimensionXYZM:
return []float64{point.X, point.Y, point.Z, point.M}
default:
// Should never happen but defensive programming
panic(fmt.Sprintf("unknown coordinate dimension: %v", point.Dimension))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implement MarshalJSON on the Point type and then just use that

Comment on lines +334 to +335
structBuilder := b.Builder.(*array.StructBuilder)
structBuilder.Append(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above needs to handle the interleaved OR separate cases


// AppendNull appends a null value to the array
func (b *PointBuilder) AppendNull() {
b.ExtensionBuilder.Builder.(*array.StructBuilder).AppendNull()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above, need to handle the FixedSizeList case

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool!

I will take a closer look over the weekend, but I did want to quickly point out up front that geoarrow.wkb is the only one required for GeoParquet/Parquet Geometry/Geography support (because in Parquet land everything is WKB binary). Functionally that just means there's a tiny bit of CRS translation that has to happen on read/write (and a bit of WKB parsing if you would like to support writing statistics).

Implementing the other extension types is also very cool but fairly difficult (the matrix expansion of point/linestring/polygon/multipoint/multilinestring/multipolygon by xy/xyz/xym/xyzm by interleaved/separated is sort of insane to deal with and is not widely used).

@Mandukhai-Alimaa
Copy link
Contributor Author

@paleolimbot, before you spend time looking at this, I will address @zeroshade's feedback and fix some stuff first.

But for the bigger picture, do you think I should punt on geoarrow.point and start with geoarrow.wkb ? It sounds like it would provide more value. What do you think?

@paleolimbot
Copy link
Member

But for the bigger picture, do you think I should punt on geoarrow.point and start with geoarrow.wkb ?

If you are aiming for Parquet Geometry/Geography support, yes, because geoarrow.point isn't used in any part of that. The metadata for geoarrow.wkb is the same as for geoarrow.point (e.g., CRS, edges), and so I think most of this PR is equally relevant to that.

@dwilson1988
Copy link

@Mandukhai-Alimaa - I'm looking into the iceberg side of this currently and am happy to help take this across the finish line. I think @paleolimbot is probably right about WKB being the priority here - I'll start with this branch and change it to support WKB first with a route to support non-encoded native types as well as a future PR

@Mandukhai-Alimaa
Copy link
Contributor Author

@dwilson1988 that sounds good. Thank you.

@zeroshade
Copy link
Member

@dwilson1988 feel free to ping me when you update this so I know to take another pass/look at this!

@dwilson1988
Copy link

dwilson1988 commented Mar 7, 2026

@zeroshade - I'm going to create a separate PR with the general pattern I came up with and an implementation for the WKB encoding.

Had a question though. For Go, https://github.com/twpayne/go-geom is a pretty solid package that has memory layout virtually identical to GeoArrow - it's used in a lot of Go that uses geometry. (Disclaimer: my company uses it a lot internally). Would you be okay adding it as a dependency? That would allow all GeoArrow types (including WKB/WKT) to Value as a geom.T very efficiently and I think make the whole thing much more useful.

Alternatively, I have a decent path to make it fairly easy for users to register their own external geometry types I think (but I got to play around a bit more to be sure it works well enough)

@dwilson1988
Copy link

I should be more clear - WKT/WKT would Value as geom.T but the others as their native go-geom types (*geom.Point, *geom.Polygon, etc)

@paleolimbot
Copy link
Member

I would personally recommend inlining the logic you need unless there's a good way to add an optional dependency. I think I upstreamed SedonaDB's interval handling into arrow-rs/parquet-geospatial which should have all you need (mainstream geometry implementations mostly don't handle wraparound intervals and so they aren't all that useful anyway).

@dwilson1988
Copy link

Dewey - was more interested in the types themselves since they are robust implementations that are fairly ubiquitous in the Go world. The WKT/WKB parsing itself wouldn't be terrible, but the geometry types themselves would be essentially recreating what's already in go-geom.

@paleolimbot
Copy link
Member

I am not familiar with the Go internals, but I'm surprised that you need concrete types for anything? (At least in terms of minimal requirement for writing statistics or doing pruning, I am not sure any of those types would help?)

@dwilson1988
Copy link

In arrow? We can provide the raw structs and such, but that puts a lot of burden on the user to get a concrete type. I agree that with iceberg we don't need much, but for a particular Value, getting the raw arrow structs is less immediately useful than getting a Go type back (as is already done for other types like uuid).

)

// Point represents a point geometry with coordinates
type Point struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paleolimbot - I meant for these types. For all the concrete geoarrow native encodings, I wanted to use go-geom types for each instead of redefining new ones.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants