Adapt your application
The purpose of doing client-side (or end-2-end) encryption in an application is to prevent the application server from reading the encrypted data.
In some cases, this is undesirable, because a feature can only work if the server can read the data, making the data used to provide that feature ineligible for client-side encryption as is.
Depending on the case, there are several strategies that can be put in place to still make that data eligible for client-side encryption.
If you would like our teams to advise you on which strategy to adopt, please do not hesitate to contact our teams directly.
Move the feature to the client side
Some functionality traditionally implemented on the server side can now in some cases be moved to the client side, thanks in part to improved browsers and device performance.
Therefore, it is no longer necessary to give the server access to the data to provide the feature, making that data eligible for client-side encryption.
Example: File generation
A practical example can be the generation of files usually done on the server side.
A pdf file is often generated using a template into which content (text, images, etc.) is injected.
To perform this operation, it is common to use FPDF in Python for example.
The alternative is to move this generation to the client side using a browser library like pdf-lib which is for example used in the generator of derogatory travel certificate used during the different lockdowns in France:
In this kind of scenario, the pdf file is not known to the backend in clear text and is therefore eligible for end-to-end encryption with the Seald-SDK.
Example: Search in text
Another scenario is the search in text contained in messages, emails, files, etc. In almost all cases, the frontend sends the query to the database or to a dedicated service like Algolia in SaaS or ElasticSearch if it is managed internally.
These services have had access to the corpus of texts in which it is necessary to search and have constituted a search "index" that they query during a request to answer the best results very quickly.
In cases where the corpus is small (e.g. a message history), the index can be generated on the client side with libraries like minisearch, used by the browser directly without interaction with the backend, then this index is encrypted and stored on the backend for later use.
Client-side search is what is used by heavyweight email clients like Mozilla Thunderbird, Microsoft Outlook, or simply Whatsapp or Signal for instant messaging.
Temporary grant of rights
When neither of the above strategies is applicable because the functionality to be developed absolutely requires server-side plaintext data, one possible strategy is to temporarily grant decryption rights to that server.
This seems counterintuitive, given that the intended security target is to protect against server malware. In practice, however, a server breach is limited in time, so that only the data accessible by the server during the duration of the attack is breached.
This presupposes early detection of the breach with effective monitoring measures.
Example: End-to-end encrypted archiving
Sometimes data is analyzed once by a server (backend) to get a result (run an OCR, a machine learning model, etc.), and then the data is stored for a longer period of time and only physical people will be able to access it from a frontend. This result can be sensitive or not.
An alternative is to encrypt the data immediately after analysis, allowing only authorized individuals to decrypt it and that the server (backend) does not keep a copy of the encryption key.
The server has the data in clear text at the very beginning of the process, but never again afterwards (nor the ability to obtain it). If the server suffered a data leak, only the data being analyzed at the time of the leak would be compromised, but not the already encrypted data.
This kind of mechanism can be implemented by integrating the Seald-SDK:
- In the backend which would have a "service identity" that would only be used to encrypt for the users of the frontend, or by using the Anonymous SDK.
- In the frontend which would allow each user to manage their own identity and decrypt data on the client side only.
Example: Granting rights afterwards for a limited period of time
The previous mechanism is only applicable when the server's use of the data is early in the data's life cycle.
When the server needs to use the data later in the data life cycle, a mechanism is needed for the frontend to grant rights to the backend for the time of its analysis.
This kind of mechanism can be implemented by integrating the Seald-SDK:
- in the backend which would have a "service identity" that would have rights on some data granted by the users of the frontend;
- in the frontend which would allow each user to manage their own identity, decrypt client-side data, grant and revoke rights to their data at the backend.
TIP
To avoid making the schema more cumbersome, the initial encryption of the data is not indicated, it can be done either by the frontend or by the backend if the paradigm of the previous paragraph is used.
Pseudonymize
When data is used on the server side (to perform some kind of automated analysis, ML algorithm, SQL query, search, etc.), it is common that only a portion of the collected data is used.
It is possible to apply stricter security measures to data unused on the server-side data than to the used server-side data. One strategy recommended by the GDPR in Article 32 is to use "pseudonymization".
This consists of dividing the dataset into two:
- the "non-identifying" data used on the server side;
- the "identifying" data that is not used on the server side but that must be kept for future use.
During the partitionning of the data, a unique random identifier, called a "pseudonym", is generated for each entry in the dataset and this "pseudonym" is the same for the same entry in the "identifying" and "non-identifying" data, so that the partition operation is reversible.
Usually, the pseudonymization operation is performed as follows
- first, the backend has the complete data and performs the pseudonymization;
- then, the backend stores the data in two separate databases, with different perimeter security measures (access control, authentication, etc.) so that the identifying data can only be read by the people authorized to reconstruct the complete data;
- finally, when an authorized person wishes to reconstruct the complete data, one only needs to retrieve the entry corresponding to a pseudonym in both databases.
A simple separation like this one poses several problems:
- since the pseudonymization operation is performed on the server side, the server has had access to the data, and could have kept a copy of it (including unintentionally);
- if the two databases are on the same server with disjoint authentication measures: a malicious system administrator or a breach of the server would breach the complete data.
An alternative is to encrypt the identifying data on the client side for authorized persons only and to place the encrypted identifying data and the non-identifying data at the backend.
In such a mechanism, identifying and non-identifying data are properly partitioned:
- the non-identifying data are readable and exploitable by the backend to provide the desired functionalities ;
- the identifying data is only readable by authorized persons after a client-side decryption in the frontend and the backend only has access to an encrypted version for which it does not have the key.
One way to perform such an encryption is to use the Seald-SDK.