Shared Data in the Cloud: What’s Next?

Lena
3 min readApr 22, 2021

If you enjoyed this article, share it and follow @lenadroid on Twitter for more insights.

Let’s look at options for data sharing in the cloud and how things might be shifting more and more in the future. This is relevant for many organizations facing the challenges of data movement cost, speed of access, or the risks of trying out new and promising approaches.

Multi-cloud comes into play as well. With improving interoperability between technologies, many companies want to choose the best services and products to solve their scenarios, even if they are services on different clouds. For example, Cloud X is good for A, Cloud Y for B. This leads to data distributed over different services, regions or clouds (take even logging information on different clouds). Organizations still want to be able to query the data efficiently & with minimal costs.

There are challenges and trade-offs. Let’s look at options.

First — the simplest and most common setup.

Several services are running in the same cloud. Compute and storage isn’t independent. To work with the data from another service we’d have to copy or move it. Nothing wrong for simple architectures, if this meets your criteria.

Second — quickly gaining adoption.

Several services in the same cloud. Compute & storage are completely separate. Better scalability, cost-efficiency, direct access to data. Ex: Azure Synapse Link for Azure Cosmos DB, BigQuery external data sources, Snowflake data sharing.

Third — common in third-party managed systems, or systems you manage that span multiple clouds and/or regions.

Consumers can access data from both clouds, requires replication (cost, time) even if it’s happening under-the-hood. E.g. cross-region data sharing in Snowflake.

Fourth: multi-cloud no-copy sharing.

✔️ Multiple clouds
✔️ Storage across multiple different-cloud services ✔️separation of compute and storage
✔️ Direct data access — no need to copy or move data

Enabled by technologies like Azure Arc or Google Anthos. E.g. BigQuery Omni, Azure Arc-enabled data services.

Features Common for Better Data Sharing

There are two themes that are true for enabling better data sharing options for an organization:

⚡️ Separation of storage and compute, link to other data sources for direct access.

⚡️ Multi-cloud or hybrid platforms that bring compute to data located anywhere (Anthos, Arc).

How It Affects Decision-Making

For cloud providers this means, companies might not be choosing some cloud services because they have already committed to a certain cloud and are storing data there.

These multi-cloud technologies still need to develop and mature, but they are really promising and useful.

If companies can store data anywhere with easy access & sharing, they will also adjust decision-making.

Developer experience, programmability, interoperability, integrations, open standards — I think these factors will become more prevalent when choosing cloud services.

This also helps with eliminating silo-teams and monster-data-lakes, enabling better data sharing options for domain-organized data-product teams, per data mesh concept.

Thank you for reading!

--

--

Lena

Solution Architecture. Distributed systems, big data, data analysis, resilient and operationally excellent systems.