feat(arrow/extensions): add support for geoarrow.point#545
feat(arrow/extensions): add support for geoarrow.point#545Mandukhai-Alimaa wants to merge 1 commit intoapache:mainfrom
Conversation
|
This is part of solving #504. On the last PR to add support for the final geoarrow data type will add the closes tag and close the issue |
| func (p Point) IsEmpty() bool { | ||
| return math.IsNaN(p.X) && math.IsNaN(p.Y) && math.IsNaN(p.Z) && math.IsNaN(p.M) | ||
| } |
There was a problem hiding this comment.
the comment says "NaN or zero" but you're only checking for NaN here
| // GeometryEncoding represents the encoding method for geometry data | ||
| type GeometryEncoding int | ||
|
|
||
| const ( | ||
| EncodingGeoArrow GeometryEncoding = iota | ||
| ) |
There was a problem hiding this comment.
what would be the alternate to this that would be the other constants?
| // Encoding specifies the geometry encoding format | ||
| Encoding GeometryEncoding `json:"encoding,omitempty"` |
There was a problem hiding this comment.
I don't see this in the spec, where is this member coming from?
| // CRS contains PROJJSON coordinate reference system information | ||
| CRS json.RawMessage `json:"crs,omitempty"` |
There was a problem hiding this comment.
While it should be PROJJSON it could also be other formats
| } | ||
|
|
||
| // GeometryMetadata contains metadata for GeoArrow geometry types | ||
| type GeometryMetadata struct { |
There was a problem hiding this comment.
you're missing the optional crs_type member which should be one of
- "projjson"
- "wkt2:2019"
- "authority_code"
- "srid"
| // Value returns the Point at the given index | ||
| func (p *PointArray) Value(i int) Point { | ||
| pointType := p.ExtensionType().(*PointType) | ||
| structArray := p.Storage().(*array.Struct) |
There was a problem hiding this comment.
need to determine whether it is interleaved or separate before you assume it's a struct
| point := Point{Dimension: pointType.dimension} | ||
|
|
||
| if p.IsNull(i) { | ||
| return point |
| switch point.Dimension { | ||
| case DimensionXY: | ||
| return []float64{point.X, point.Y} | ||
| case DimensionXYZ: | ||
| return []float64{point.X, point.Y, point.Z} | ||
| case DimensionXYM: | ||
| return []float64{point.X, point.Y, point.M} | ||
| case DimensionXYZM: | ||
| return []float64{point.X, point.Y, point.Z, point.M} | ||
| default: | ||
| // Should never happen but defensive programming | ||
| panic(fmt.Sprintf("unknown coordinate dimension: %v", point.Dimension)) | ||
| } |
There was a problem hiding this comment.
implement MarshalJSON on the Point type and then just use that
| structBuilder := b.Builder.(*array.StructBuilder) | ||
| structBuilder.Append(true) |
There was a problem hiding this comment.
same comment as above needs to handle the interleaved OR separate cases
|
|
||
| // AppendNull appends a null value to the array | ||
| func (b *PointBuilder) AppendNull() { | ||
| b.ExtensionBuilder.Builder.(*array.StructBuilder).AppendNull() |
There was a problem hiding this comment.
same comment as above, need to handle the FixedSizeList case
paleolimbot
left a comment
There was a problem hiding this comment.
Very cool!
I will take a closer look over the weekend, but I did want to quickly point out up front that geoarrow.wkb is the only one required for GeoParquet/Parquet Geometry/Geography support (because in Parquet land everything is WKB binary). Functionally that just means there's a tiny bit of CRS translation that has to happen on read/write (and a bit of WKB parsing if you would like to support writing statistics).
Implementing the other extension types is also very cool but fairly difficult (the matrix expansion of point/linestring/polygon/multipoint/multilinestring/multipolygon by xy/xyz/xym/xyzm by interleaved/separated is sort of insane to deal with and is not widely used).
|
@paleolimbot, before you spend time looking at this, I will address @zeroshade's feedback and fix some stuff first. But for the bigger picture, do you think I should punt on |
If you are aiming for Parquet Geometry/Geography support, yes, because geoarrow.point isn't used in any part of that. The metadata for geoarrow.wkb is the same as for geoarrow.point (e.g., CRS, edges), and so I think most of this PR is equally relevant to that. |
|
@Mandukhai-Alimaa - I'm looking into the iceberg side of this currently and am happy to help take this across the finish line. I think @paleolimbot is probably right about WKB being the priority here - I'll start with this branch and change it to support WKB first with a route to support non-encoded native types as well as a future PR |
|
@dwilson1988 that sounds good. Thank you. |
|
@dwilson1988 feel free to ping me when you update this so I know to take another pass/look at this! |
|
@zeroshade - I'm going to create a separate PR with the general pattern I came up with and an implementation for the WKB encoding. Had a question though. For Go, https://github.com/twpayne/go-geom is a pretty solid package that has memory layout virtually identical to GeoArrow - it's used in a lot of Go that uses geometry. (Disclaimer: my company uses it a lot internally). Would you be okay adding it as a dependency? That would allow all GeoArrow types (including WKB/WKT) to Alternatively, I have a decent path to make it fairly easy for users to register their own external geometry types I think (but I got to play around a bit more to be sure it works well enough) |
|
I should be more clear - WKT/WKT would |
|
I would personally recommend inlining the logic you need unless there's a good way to add an optional dependency. I think I upstreamed SedonaDB's interval handling into arrow-rs/parquet-geospatial which should have all you need (mainstream geometry implementations mostly don't handle wraparound intervals and so they aren't all that useful anyway). |
|
Dewey - was more interested in the types themselves since they are robust implementations that are fairly ubiquitous in the Go world. The WKT/WKB parsing itself wouldn't be terrible, but the geometry types themselves would be essentially recreating what's already in go-geom. |
|
I am not familiar with the Go internals, but I'm surprised that you need concrete types for anything? (At least in terms of minimal requirement for writing statistics or doing pruning, I am not sure any of those types would help?) |
|
In arrow? We can provide the raw structs and such, but that puts a lot of burden on the user to get a concrete type. I agree that with iceberg we don't need much, but for a particular Value, getting the raw arrow structs is less immediately useful than getting a Go type back (as is already done for other types like |
| ) | ||
|
|
||
| // Point represents a point geometry with coordinates | ||
| type Point struct { |
There was a problem hiding this comment.
@paleolimbot - I meant for these types. For all the concrete geoarrow native encodings, I wanted to use go-geom types for each instead of redefining new ones.
There was a problem hiding this comment.
Rationale for this change
There is no support for geoarrow in go
What changes are included in this PR?
Add support for geoarrow.point and register it as extension type and add tests for the geoarrow.point
Are these changes tested?
Yes, the newly added test and all the other tests are passing.
Are there any user-facing changes?
Users will be able to use geoarrow.point as an extension type.