Hacker News

276

Pg_lake: Postgres with Iceberg and data lake access

by plaur782176227274781 comments
How do you use your data lake? For me it is much more than just storing data, it is just as much for crunching numbers in unpredictable ways.

And this is where postgres does not cut it.

You need some more CPU and RAM than what you pay for in your postgres instance. I.e. a distributed engine where you don't have to worry about how big your database instance is today.

by fifilura1762308939
This is huge!

When people ask me what’s missing in the Postgres market, I used to tell them “open source Snowflake.”

Crunchy’s Postgres extension is by far the most ahead solution in the market.

Huge congrats to Snowflake and the Crunchy team on open sourcing this.

by ozgune1762273123
Why not just use Ducklake?[1] That reduces complexity[2] since only DuckDB and PostgreSQL with pg_duckdb are required.

[1] https://ducklake.select/

[2] DuckLake - The SQL-Powered Lakehouse Format for the Rest of Us by Prof. Hannes Mühleisen: https://www.youtube.com/watch?v=YQEUkFWa69o

by boshomi1762279897
When Snowflake bought Crunchy Data I was hoping they were going to offer a managed version of this

It's great that I can run this locally in a Docker container, I'd love to be able to run a managed instance on AWS billed through our existing Snowflake account

by anentropic1762276997
Man, we are living in the golden era of PostgreSQL.
by gajus1762276616
I’m not a data engineer but work in an adjacent role. Is there anyone here who could dumb the use case down? Maybe an example of a problem this solves. I am struggling to understand the value proposition here.
by NeutralCrane1762300699
With S3 Table Buckets, Cloudflare R2 Data Catalog and now this, Iceberg seems to be winning.
by ayhanfuat1762274282
Why would Snowflake develop and release this? Doesn't this cannibalize their main product?
by dharbin1762275223
This is so cool! We have files in Iceberg that we then move data to/from to a PG db using a custom utility. It always felt more like a workaround that didn’t fully use the capabilities of both the technologies. Can’t wait to try this out.
by darth_avocado1762280041
I was going to ask if you could then put DuckDB over Postgres for the OLAP query engine -- looks like that's already what it does! very interesting development in the data lake space alongside DuckLake and things
by dkdcio1762274478
This is awesome, I will be trying this out in the coming months. Its just made it to the top of my R&D shortlist for things that could massively simplify our data stack for a b2b saas.
by pjd71762293607
Very cool. One question that comes up for me is whether pg_lake expects to control the Iceberg metadata, or whether it can be used purely as a read layer. If I make schema updates and partition changes to iceberg directly, without going through pg_lake, will pg_lake's catalog correctly reflect things right away?
by spenczar51762279732
Nice does this also allow me to write to parquet from my Postgres table?
by lysecret1762285728
More integrations are great. Anyway, the "this is awesome" moment (for me) will be when you could mix row- and column-oriented tables in Postgres, a bit like Timescale but native Postgres and well done. Hopefully one day.
by drchaim1762281532
I’m not super into the Data sphere but my company relies heavily on Snowflake which is becoming an issue.

This announcement seems huge to me, no?!

Is this really an open source Snowflake covering most use cases?

by apexalpha1762284424
Can someone dumb this down a bit for a non data-engineer? Hard to fully wrap my head around who this is/isn’t best suited for.
by claudeomusic1762286115
I love this. There are definitely shops where the data is a bit too much for postgres but something like Snowflake would be overkill. Wish this was around a couple years ago lol
by fridder1762280067
If anyone from Supabase is reading, it would be awesome to have this extension!
by iamcreasy1762292742
This is really nice though looking at the code - a lot of the postgres types are missing as well a lot of the newer parquet logical types - but this is a great start and a nice use of FDW.
by inglor1762277831
This is cool to see! Looks like a compete against pg_mooncake which Databricks acquired. But how is this different from pg_duckdb?
by harisund19901762283643
Very cool! Was there any inherent limitation with postgresql or its extension system that forced pg_lake to use duckdb as query engine?
by iamcreasy1762279271
Interesting! How does it compare with ducklake?
by oulipo21762277387
Crunchydata did it first :) but nice to get more options
by scirob1762291625
I love postgres and have created my own "data lake" sorta systems -- what would this add to my workflows?
by chaps1762275688
RDS really needs to make it easy to install your own PG modules.
by whalesalad1762301911
Curious why pgduck_server is a totally separate process?
by beoberha1762274839
Does anyone know how access control works to the underlying s3 objects? I didn’t see anything regarding grants in the docs.
by mberning1762275525
[dead]
by hamonrye1762284100
[dead]
by rizky051762295339