Updated: Aug 24
What Everyone Should Know About Data Residency
Picture this, you're a globe-trotting businessperson living out of a suitcase, hopping from one country to another, one state to another for more than half a year. At the year's close, suddenly, you need to tackle a daunting term – tax residency.
You might have stayed in many tax regimes (e.g., countries, but even within different states within countries), each with its own rules when your presence translates into becoming a tax resident. And since the various tax regimes can overlap, you might end up in a messy and potentially unfavorable tax position.
But let's change the scene. It's not you that's traveling, but your data. You might store customer information in SalesForce, have PII (Personally Identifiable Information) in an Excel sheet on Dropbox, or run a KYC (Know Your Customer) process on a SaaS (Software as a Service) solution in the US. You may not know where your data and, more importantly, your customer's data resides logically (what cloud provider sits under a SaaS product) or physically (in which country is the data center where your data is collected, stored, or processed).
Your company might have a contract with one of the big cloud providers, and you feel safe that they manage your data, so nothing gets lost. But do you know where your data is stored (at rest)? After all, the cloud's glossy sheen is just another computer. Worse still, your data might be on a never-ending journey, constantly shuttling (in transit) between different regions (and data centers) between you, your customers, and your service providers.
It is important to know the ins and outs of your data's journey, both stationary and in motion. This knowledge shapes the legal and regulatory framework that applies to your data, your responsibilities, and the risks you may encounter.
And this journey lands us at the final checkpoint: compliance. It's like the various tax inspectors scrutinizing your financial documents at the end of the fiscal year. Understanding where your data lies at rest and where it transits is like having your receipts in order. It allows you to navigate the intricate maze of data laws without tripping into the thorny bush of regulatory enforcement. The true challenge isn't just planning your data's global excursion but ensuring its journey remains in sync with the constantly shifting terrain of international regulations.
Data Residency In Practice
Data residency is an issue that became much more prominent with the emergence of CSP (Cloud Service Providers) and specifically cloud storage together with strong data protections such as GDPR (General Data Protection Regulation), HIPAA (Health Insurance Portability and Accountability Act), CCPA (California Consumer Privacy Act), or PCI (Payment Card Industry) and various other local regulations around in-country data storage for customer data (Russia, China, India).
Data affected by data residency considerations might be stored in databases and file storage. Data storage locations may be chosen for operational reasons (stored close to where it is consumed) or to create resilience for business continuity (storing multiple copies in isolated locations). Disaster recovery rules may necessitate a geographic separation of live data and backup.
Different data classes may be subject to different rules. They must be stored separately to comply with different laws and regulations. When considering data residency, the most common classes of data are PII (Personally Identifiable Information), healthcare data, credit card / PCI data, or classified and sensitive data.
Modern cloud providers (i.e., Google Cloud, Amazon AWS, Microsoft Azure, etc.) make starting easy. Pricing structures may direct a user to choose cheaper locations, which have data residency implications whose impacts may be much more severe and costly than the initial low-cost barrier to entry into cloud hosting. Having understood specific requirements in some industries/governments, larger cloud providers may have separate offerings for larger markets (e.g., healthcare or government) that comply with specific regulations and are certified to be used with data in the scope of these certifications.
Technological innovation presents another layer of complexity in deciding if things such as caching or database sharding are affected by data residency.
While data residency is typically associated with a specific jurisdiction or location, there are situations where it can have a hybrid nature due to various factors. For example:
Cross-Border Data Transfers: Organizations that operate in multiple countries may need to transfer data across borders to meet business requirements. This can result in data being subject to different legal frameworks depending on location.
Data Replication and Backup: Organizations often replicate and store data in multiple locations to ensure data redundancy and disaster recovery. In these cases, data may have a primary residency location. Still, copies or backups exist in other jurisdictions.
Outsourcing and Third-Party Services: When organizations outsource certain functions or engage third-party service providers, data may be processed or stored by these entities in their infrastructure, which could be located in different jurisdictions.
International Partnerships or Collaborations: When organizations from different countries collaborate or form partnerships, data may need to be shared and stored in both jurisdictions.
The Difference Between Data Residency, Data Sovereignty, and Data Localization
Data residency, data sovereignty, and data localization are related concepts that deal with the storage, handling, and control of data. While they are interconnected, there are subtle differences between them:
Data Residency: Data residency refers to the physical or geographical location where data is stored or processed. It relates to the jurisdiction or region where data is hosted, regardless of who has access to or controls it. Data residency requirements may arise due to legal, regulatory, or contractual obligations.
Data Sovereignty: Data sovereignty refers to the concept that a country or jurisdiction has the authority and control over data generated or collected within its boundaries. It asserts a country's right to govern and regulate data within its jurisdiction, including determining how it is stored, accessed, and protected. Data sovereignty emphasizes the jurisdiction's ability to enforce its laws and regulations regarding data privacy, security, and control.
Data Localization: Data localization refers to the practice of requiring data to be stored or processed within a specific geographic location or jurisdiction. It involves imposing legal or regulatory measures that mandate data to remain within a particular jurisdiction, limiting or prohibiting its transfer outside of that jurisdiction. Data localization measures are often driven by concerns over data protection, national security, or the desire to support local economic interests.
Understand Risk And Exposure
From a practical perspective, business continuity can be greatly affected if a data center goes down temporarily or permanently (e.g., data center fire at French hoster OVH). When properly architected and selected, a CSP can provide data durability that is difficult to match with ordinary IT budgets: "If you store 10,000 objects with us, on average, we may lose one every 10 million years or so. This storage is designed in such a way that we can sustain the concurrent loss of data in two separate storage facilities." (AWS).
In addition to a service quality that is hard to match, cloud providers provide lower prices through economies of scale and better utilization. In exchange, their customers give up an element of control of the underlying technology stack. The further up the stack, the less control of the technical implementation and transparency is available.
Understanding data residency for IaaS applications might be straightforward (regions and availability zones). Still, PaaS and SaaS offerings may rely on multiple levels of CSP to provide the services to an end customer. As providers wrap up other services, issues may arise on each level, and traceability may be difficult. Sometimes the provider can't or won't disclose the underlying architecture details, and the customer needs to rely on policies and procedures to understand where the data is stored.
On the one hand, physical/geographic location, the type of data, and different certifications of the cloud provider (e.g., government cloud) need to be considered in the architecture of any application relying on data storage in the cloud. On the other hand, the legal and regulatory frameworks that apply to the data and regional/national regulations may prescribe how data must and can be collected, stored, processed, and transmitted.
Understand Where The Data Resides
To establish your exposure to data residency, we need to establish who owns, processes, and stores what data. Physical storage location, but also legal incorporation of the data controller and processor are critical to understanding as regulations can apply to the foreign data of a local entity or the local data of a foreign entity:
Data Ownership: Identify who owns the data. This goes beyond just the creator of the data; consider who else might have rights or access to it.
Data Processing: Determine who is processing your data and what it's being used for. Remember that processing can include anything from collecting and recording data to organizing, structuring, and storing it.
Data Storage: Find out where your data is physically stored. This includes the location of the data centers and any backups or redundancies.
Legal Incorporation: Identify where the entities controlling and processing your data are legally incorporated. For instance, a US company storing data in the UK might still subject the data to certain US regulations.
User Location: Understand where your users are based. The applicability of regulations can depend on user location, such as the EU's GDPR.
Regulatory Exposure: Consider foreign and local laws and how they apply to your data. This includes understanding obligations under laws that require data localization or restrict cross-border data transfers.
Third-Party Access: Analyze any potential security risks surrounding third-party access to data, including from governments and law enforcement agencies.
In critical cases, establishing data residency may require due diligence by technical experts. Trusting generic and public documents from cloud providers may not be sufficient, as violating laws and regulations could lead to significant fines and PR (public relations) issues. The GDPR (especially in connection with the invalidation of Safe Harbour in 2015 and Privacy Shield in 2020 by the ECJ and the prohibition of EU citizen's data transfers to countries that have lower data protection standards), for example, set the fines for the most egregious violations at up to 4% of global turnover and the regulators so far have issued more than 1,600 fines totaling more than €2 billion.
Applicability of regulations could arise from the users' location (GDPR: Is it an EU user?) or the data storage location (e.g., Russian, Indian, and Chinese requirements to store data in-country - data localization). On the extreme end, we've seen nationalistic steps to cut off data access (e.g., Russo-Ukrainian war) or security risks around third-party access to data (security services, law enforcement). The stricter view of data localization may be exclusive (data must stay in the country, e.g., Russia) or additive (at least a copy must be accessible in the country. E.g., India's personal data protection bill).
A further challenge is the dynamic nature of laws, regulations, and especially technology. Significant regulatory changes will receive coverage in the press or industry networks. Still, not all changes to the political, regulatory, social, and technological environment might be communicated widely. Cloud providers can change the location of data centers, or SaaS providers change the underlying CSP, regulation gets introduced or changed, political and legal changes (e.g., Brexit, ECJ invalidating Safe Harbour and Privacy Shield) or multi-level regulations (e.g., EU and national, US and California) can have a significant impact on legacy architectures that were compliant when initially deployed.
Three Data Residency Mitigations
SLAs with cloud service providers help to create an assurance of where data can be stored. However, from a legal perspective, the data controller or processor might still be liable, even if they could subsequently try to recover damages from the CSP based on inadequate SLAs.
Strong encryption can offer technical mitigation around data residency. Adequately encrypted data is factually indistinguishable from random noise, so an adversary gets no indication that any encrypted medium contains useful data. The key is short and can be protected strongly, while the amount of encrypted data can be vast but useless without access to the key. Apple and Google use encryption to make vast amounts of data on mobile phones useless by destroying the keys to the data (e.g., wiping a lost phone can make GB of data useless within a fraction of a second by merely securely wiping a key that's a few hundred bits long).
However, it is unclear where the data 'resides' if it's stored encrypted in one country but only ever decrypted and processed outside that country, but not stored there. Aside from mitigating data residency concerns, encryption can further benefit data security as encrypted data is useless for an unauthorized attacker.
Data decomposition, distribution, and tokenization are other measures available to mitigate data residency regulations' impact. Data is being split into parts that, by themselves, make little sense. Any exposure of the decomposed data or the tokens would not risk the underlying data or allow the identification of individual data points. Tokenization is used, for example, in the payments industry to avoid disclosing payment information data to parties involved in a transaction. The counterparties do not need to process the payment and hence do not need to know a credit card number or bank account but will need some reference in case a payment does not go through.
In this grand game of digital chess, bringing data back home may seem like a comforting move, but it can also open up new vulnerabilities. The truth is Cloud Service Providers (CSPs) possess a blend of resources, scale, and expertise that is hard to match by most corporate IT operations that aren't specialized in cloud hosting.
The choice to entrust your data to a CSP is akin to giving up the driver's seat yet potentially ensuring a safer ride. Given their budget constraints, CSPs are motivated to have top-notch security measures, often surpassing what corporate IT departments can afford or prioritize. So, cloud storage stands out as the savvy choice for most organizations.
In cases where things are more straightforward, the cloud provider's certifications and assurances may be all you need to rest easy. However, as the stakes rise, a deeper dive is necessary. Expert assessment and regular verification of the CSP's architecture and practices become essential to maintaining compliance.
The intricate tapestry of data residency is vast and ever-changing. Navigating its complexities requires constant vigilance and a deep understanding of the landscape. Do you feel ready to embark on this journey, or could you use a seasoned guide? If you'd like a more detailed assessment or have any questions, feel free to get in touch. After all, the route to data compliance is a journey best taken together.
About the Author
Danny Rohde has worked in consulting and technology-enabled business transformation for 20 years. As a senior practitioner at RingStone, he works with private equity firms globally in an advisory capacity. Before RingStone, Danny worked for several tier-1 consulting companies and served clients across the globe in many industries. He focuses on business process automation and digital transformation, especially in complex environments requiring the integration of people, processes, and technology change. Danny holds a postgraduate degree from Germany in International Business and a double master's degree in Management and Communication from Macquarie University Sydney. Contact Danny at firstname.lastname@example.org.