OLAP vs OLTP - data lakes and the three-layer architecture question
Hey folks,
I have a really simple question, and I feel kind of dumb asking it - it's ELI5 time.
When you run your data lakes, or your three-layer architectures, what format is your data in for each stage?
We're in Sql at the moment and it's really simple for me to use OLTP so that when I am updating an order record, I can just merge on that record.
When I read about data lakes, and parquet, it sounds like you're uploading your raw and staging data in the columnar format files, and then actioning the stages in parquet, or in a data warehouse like snowflake or databricks.
Isn't there a large performance issue when you need to update individual records in columnar storage?
Wouldn't it be better for it to remain in row-based through to the point you want to aggregate results for presentation?
I keep reading about how columnar storage is slow on write, fast on read, and wonder why it sounds like transformations aren't kept in a fast-write environment until the final step. Am I missing something?