Instead of using integers IDs for the hexes, we created an encoded version of the ID that has the property that removing a character gets you the containing parent of the cell. This means we can do basic containment queries by querying with a low resolution hex (short string) as a prefix query. If a gps track goes through this larger parent cell, the track will have hexes with the same prefix. You don’t get perfect control of distances because hexes have varying diameters (or rather the approximation, since they aren’t circles they are hexes), but in practice and at scale for a product that doesn’t require high precision, it’s very effective.
I think at the end of this year we’ll have about 6tb of these hex sets in a four node 8 process ES cluster. Performance is pretty good. Also acts as our full text search. Half the time we want a geo search we also want keyword / filtering / etc on the metadata of these trips.
Pretty fun system to build, and the concept works with a wide variety of data stores. Felt like a total hack job but it has stood the test of time.
Thanks uber, h3 is a great library!
If joins are a critical performance-sensitive operation, the most important property of a DGGS is congruency. H3 is not congruent it was optimized for visualization, where congruency doesn’t matter, rather than analytical computation. For example, the article talks about deduplication, which is not even necessary with a congruent DGGS. You can do joins with H3 but it is not recommended as a general rule unless the data is small such that you can afford to brute-force it to some extent.
H3 is great for doing point geometry aggregates. It shines at that. Not so much geospatial joins though. DGGS optimized for analytic computation (and joins by implication) exist, they just aren’t optimal for trivial visualization.
At 500 stations:
- H3: 218µs, 4.7KB, 109 allocs
- Fallback: 166µs, 1KB, 37 allocs
- Fallback is 31% faster
At 1000 stations:
- H3: 352µs, 4.7KB, 109 allocs
- Fallback: 312µs, 1KB, 37 allocs
- Fallback is 13% faster
At 2000 stations:
- H3: 664µs, 4.7KB, 109 allocs
- Fallback: 613µs, 1KB, 37 allocs
- Fallback is 8% faster
At 4500 stations (real-world scale):
- H3: 1.40ms, 4.7KB, 109 allocs
- Fallback: 1.34ms, 1KB, 37 allocs
- Fallback is 4% faster
Conclusion: The gap narrows as station count increases. At 4500 stations they're nearly equivalent. H3 has fixed overhead (~4.7KB/109 allocs for k=2 ring), while fallback scales linearly. The crossover point where H3 wins is likely
around 10-20K entries.I wonder how it compare with geohashing, I know it is not as efficient in term of partitioning and query end up weird since you need to manage decoding neighbor cells but finding all element of a cell is a "starts with" query which allow to put data effectively on most nosql databases with some sort of text sorting