Tremendous progress has been made over the last several years to protect sensitive data in transit and in storage. But sensitive data may still be vulnerable when it is in use. For example, consider transparent database encryption (TDE). While TDE ensures sensitive data is protected in storage, that same sensitive data must be stored in cleartext in the database buffer pool so that SQL queries can be processed. This renders the sensitive data vulnerable because its confidentiality may be compromised in several ways, including memory-scraping malware and privileged user abuse.
This concern around protecting data in use has been the primary reason holding back many organizations from saving on IT infrastructure costs by delegating certain computations to the cloud and from sharing private data with their peers for collaborative analytics. Confidential computing and fully homomorphic encryption (FHE) are two promising emerging technologies for addressing this concern and enabling organizations to unlock the value of sensitive data. What are these, and what are the differences between them?
Use cases for data in use protection
Until recently, sharing private data with collaborators and consuming cloud data services have been constant challenges for many organizations. For some, the value derived from sharing data with collaborators and consuming cloud data services justifies accepting the risk that private data may be vulnerable while it is in use. But, for other organizations, such a trade-off is not on the agenda. What if organizations were not forced to make such a trade-off? What if data can be protected not only in transit and storage but also in use? This would open the door to a variety of use cases:
- Secure database processing for the cloud: Cloud database services employ transport layer security (TLS) to protect data as it transits between the database server and client applications. They also employ a variety of database encryption techniques to protect data in storage. However, when it comes to database query processing, the data must reside in the main memory in cleartext. FHE can be used to perform query processing directly on encrypted data, thus ensuring sensitive data is encrypted in all three states: in transit, in storage and in use. Confidential computing does not enable query processing on encrypted data but can be used to ensure that such computation is performed in a trusted execution environment (TEE) so that sensitive data is protected while it is in use.
- Secure data sharing for collaborative analytics: In the financial industry, organizations have a need to share private data with their peers to help prevent financial fraud. In the health care industry, organizations need to share private data to treat patients and develop cures for new diseases. In such cases, organizations struggle with how to derive the desired outcome from sharing private data while still complying with data privacy laws. FHE can be used to address this dilemma by performing the analytics directly on the encrypted data, ensuring that the data remains protected while in use. Confidential computing can be used to ensure that the data is combined and analyzed within the TEE so that it is protected while in use.
- Saving IT costs by delegating computation to the cloud: Financial institutions train and deploy machine learning (ML) models to better understand their clients and tailor specific products for them. For example, the marketing department might want to understand a client’s propensity to take out a loan within the next three months and tailor an offer for them. Financial institutions might want to save on storage costs by moving clients’ data to cheaper cloud storage and running the analytics there. However, this poses a problem for both the privacy of the clients’ data and the privacy of the ML models themselves. FHE can be used to address this challenge by encrypting the ML models and running them directly on encrypted data, ensuring both the private data and ML models are protected while in use. Confidential computing protects the private data and ML models while in use by ensuring this computation is run within a TEE.
- Strengthening adherence to zero trust security principles: As attacks on data in transit and in storage are countered by standard protection mechanisms such as TLS and TDE, attackers are shifting their focus to data in use. In this context, attack techniques are employed to target data in use, such as memory scraping, hypervisor and container breakout and firmware compromise. FHE and confidential computing strengthen adherence to zero trust security principles by removing the implicit trust that applications would otherwise need to place in the underlying software stack to protect data in use.
Confidential computing
Sensitive data may be vulnerable during computation, as it typically resides in the main memory in cleartext. Confidential computing addresses this concern by ensuring that computation on such sensitive data is performed in a TEE, which is a hardware-based mechanism that prevents unauthorized access or modification of sensitive data.
Figure 1: Protecting Sensitive Data During Computation Using Secure Enclaves.
Intel Software Guard Extensions (SGX) is one widely-known example of confidential computing. It enables an application to define a private region of main memory, called a secure enclave, whose content cannot be read or written by any process from outside the enclave regardless of its privilege level or central processing unit (CPU) mode. This isolation protects the enclave even when the operating system (OS), hypervisor and container engine are compromised. In addition, the enclave memory is encrypted with keys stored within the CPU itself. Decryption happens inside the CPU only for code within the enclave. This means that even if a malicious entity were to physically steal the enclave memory, it would be of no use to them.
Two approaches to confidential computing
Today, two main approaches are used for confidential computing: application software development kits (SDKs) and runtime deployment systems. The Intel SGX capability mentioned above is one example of the application SDK-based approach. In this approach, the developer is responsible for dividing the application into untrusted code and trusted code. The untrusted code runs normally on the OS, while the trusted code runs within the secure enclave. The SDKs provide the necessary application programming interfaces (APIs) to create and manage secure enclaves.
The Open Enclave SDK is another example of the application SDK-based approach. It is an open-source SDK that provides a level of abstraction to enable developers to build TEE-based applications once and deploy them on multiple hardware platforms. The application SDK-based approach allows for better scrutiny of the trusted code since this is less code to review, but it does require changes to the application.
The goal of the runtime deployment system-based approach is to enable applications to run in a TEE without having to rewrite them for a particular hardware platform or SDK. Examples of solutions in this category include IBM Secure Execution for Linux (IBM Z15 and LinuxOne III) and the open-source project Enarx. Cost reduction and time to value are clearly the two biggest advantages of the runtime deployment system-based approach. However, deploying applications without any modifications may prevent them from taking advantage of other features, such as attestation, unless such applications have already been coded with that in mind.
Fully homomorphic encryption
You can rely on traditional encryption schemes such as the advanced encryption standard (AES) for protecting data in transit and in storage. But they do not enable computation on encrypted data. In other words, data must be first decrypted before it can be operated upon. During this ‘data in use’ state, sensitive data can be vulnerable. FHE addresses this problem by enabling computation directly on encrypted data. So, what exactly is homomorphic encryption, and what makes a homomorphic encryption scheme fully homomorphic?
A homomorphic encryption scheme supports some form of computation on encrypted data. For instance, given a cleartext input x and its encrypted value E(x), it should be possible to compute E(f(x)) for some function f, without having access to x or any other secret information. In this context, many of the well-known encryption schemes exhibit some homomorphic properties. For example, with RSA, a cleartext input x is encrypted as E(x) = xe mod m, where e is a public exponent and m is a public modulus. It is easy to see that, given two ciphertexts E(x) = xe mod m and E(y) = ye mod m, encrypting two cleartext inputs x and y, we can multiply them together and obtain (xy)e mod m, which is the encrypted value of xy. This means that RSA is homomorphic for multiplication.
Consider the Paillier encryption scheme. With Paillier, a cleartext input x is encrypted as E(x) = gx rm mod m2, where g is the base, m is the modulus and r is random. It is easy to see that given two ciphertexts E(x) = gx rm mod m2 and E(y) = gy sm mod m2, encrypting two cleartext inputs x and y, we can multiply them together and obtain g(x+y) (rs)m mod m2, which is the encrypted value of x + y. This means that Paillier is homomorphic for addition. A homomorphic encryption scheme that supports only multiplication or only addition is called a partially homomorphic encryption scheme.
Early attempts at homomorphic encryption
Early schemes that supported both multiplication and addition, such as DGHV, had a limit on the number of operations that could be carried on encrypted data. Therefore, these were called somewhat homomorphic encryption. In these schemes, a ‘noise’ is added during encryption for security purposes.
It turned out that this noise grows with each addition or multiplication operation. This noise can become so significant that the ciphertext cannot be correctly decrypted. FHE is therefore any scheme that supports an unbounded number of multiplications and additions on encrypted data.
The breakthrough came in 2009 with Craig Gentry of IBM. His lattice-based encryption scheme was the first plausible FHE scheme. The critical idea in Gentry’s work is called bootstrapping. Bootstrapping refers to the process of refreshing a ciphertext in order to produce a new ciphertext that encrypts the same data, but with a lower level of noise so that more homomorphic operations can be evaluated on it.
Bootstrapping
Conceptually, bootstrapping can be thought of as decrypting the ciphertext with the secret key and then re-encrypting the data. Except, the secret key is not known. It is replaced by an encryption of the secret key, called the bootstrapping key. Bootstrapping is the core of most FHE schemes known to date.
Figure 2 illustrates how FHE can be used to delegate computation on sensitive data to the cloud while still maintaining full control of data privacy.
Figure 2: Fully Homomorphic Encryption.
FHE is a form of asymmetric encryption, thus the use of a public key (pk) and a secret key (sk) as shown in the figure. Alice encrypts her data with the secret key sk and shares her public key pk with the cloud service, where it is used in the evaluation of function f on the encrypted data. When she receives the result, Alice uses her secret key to decrypt it and obtain f(x).
How do FHE schemes work?
FHE schemes can be divided into three main categories depending on how they model computation:
- Boolean Circuits: Expresses computation as Boolean circuits; this is most suitable for number comparisons. Examples of such schemes include GSW and TFHE.
- Modular Arithmetic: Expresses computation as integer arithmetic; this is most suitable for integer arithmetic and scalar multiplication. Examples of such schemes include BGV and BFV.
- Approximate Arithmetic: Expresses computation as floating-point arithmetic; this is most suitable for polynomial approximations and ML models. Examples of such schemes include CKKS.
Several open-source implementations of FHE exist today. They include IBM HELib, PALISADE and Microsoft SEAL. Some schemes, such as CKKS, are available in all three libraries, but others may not be available across all of them. For example, BGV is available only in IBM HELib and PALISADE.
Comparing FHE and confidential computing
FHE and confidential computing are both emerging technologies for protecting data in use. They help ensure the confidentiality of sensitive/private data while it is in use. FHE is based on cryptography; therefore, its security is mathematically provable. On the other hand, confidential computing is based on TEE. Therefore, its security is not mathematically provable. For example, while TEE provides a high level of security through hardware-based isolation, it cannot protect against side-channel attacks.
While FHE provides stronger privacy guarantees, it cannot guarantee the integrity of code execution. This is where confidential computing excels. By running code within a TEE, confidential computing provides stronger guarantees when it comes to the integrity of code execution. Therefore, FHE and confidential computing should not be viewed as competing solutions, but as complementary.
The next frontier
Protecting data in use is the next frontier for data security. It enables organizations to save on IT infrastructure costs by delegating computation to the cloud in confidence. It also opens the door for collaborative analytics over private data while still complying with privacy mandates. Confidential computing and FHE are key emerging technologies for protecting data in use and enabling those use cases. From a timeline perspective, confidential computing is more likely to be the technology that will be widely adopted first, particularly the runtime deployment system type, as this does not require any application changes. Some initial examples of this are available today, such as the IBM Data Shield offering on IBM Cloud or the Always Encrypted database on Microsoft Azure.
FHE has made tremendous progress over the last decade, but it needs to evolve beyond low-level cryptographic libraries to facilitate its use and adoption in creating new applications. Some important steps in this direction are being made. For example, the recently announced IBM HElayers SDK enables running artificial intelligence workloads on encrypted data without having to understand the low-level cryptographic underpinnings. The IBM HElayers SDK includes a Python API that enables application developers and data scientists to use the power of FHE by supporting a wide array of analytics, such as linear regression, logistic regression and neural networks.
CTO for Data Security, IBM