Data Integrity and Evidence in the Cloud

How does cloud computing affect the risks of lost, incomplete, or altered data? Often, the discussion of this question focuses on the security risks in transmitting data over public networks and storing it in dispersed facilities, sometimes in the control of diverse entities. Less often recognized is the fact that cloud computing, if not properly implemented, may jeopardize data integrity simply in the way that transactions are entered and recorded.  Questionable data integrity has legal as well as operational consequences, and it should be taken into account in due diligence, contracting, and reference to standards in cloud computing solutions.

Consider a traditional business data transaction such as recording a customer order or a new hire. Sales or human resources staff, or possibly data entry clerks, type required information into an application hosted on premises. The data may be stored in multiple local databases. For example, the customer screen presented by an ERP (enterprise resource planning) system may automatically populate fields in separate order fulfillment, accounting, and customer relationship management systems, and perhaps in a marketing database as well. The new hire screen may feed relevant data to human resources, accounting, and payroll systems or modules.

The interaction between the data entry system and the multiple databases is normally effected through database APIs (application programming interfaces) designed or tested by the database vendors. The input is also typically monitored on the fly by a database “transactions manager” function designed to ensure, for example, that all required data elements are entered and are within prescribed parameters, and that they are all received by the respective database management systems.

Cloud computing solutions, by contrast, are often based on data entry via web applications. The HTTP Internet protocol was not designed to support transactions management or monitor complete delivery of upstream data. Some cloud computing vendors essentially ignore this issue, while others offer solutions such as application APIs on one end or the other, or XML-based APIs that can monitor the integrity of data input over HTTP.

Since the 1980s, database management systems routinely have been designed to incorporate the properties of “ACID” (atomicity, consistency, isolation, and durability). The question for the customer is whether a particular cloud computing solution offers similar fail-safe controls against dangerously incomplete transactions and records.

With apologies to IT professionals who understand this subject much better than I do as a lawyer, here is what is entailed in ensuring data integrity in a database management system:

Atomicity means that the transaction is aborted unless all required data elements are successfully recorded in all required systems. The transaction must entirely succeed or entirely fail. As a consequence, no payment will be sent without an associated taxpayer identification number, and HR records will not miss any personnel that show up in the payroll system. Human error, device failure, software bugs, communications or power outages – whatever the source of a failure, if all the required data are not recorded by all of the systems for which it is intended, then none of them are; the transaction fails and must be restarted. This is relatively easy to control when all of the input comes through a single data entry system with defined APIs and is distributed simultaneously to the relevant on-premises database applications. But in cloud computing, the data are typically entered via web browser and may go to separate vendors – for outsourced HR information management and payroll services, for instance – and there may be no immediate cross-check between them. Data successfully recorded on one of the systems may nevertheless fail to be recorded on the other, and the error may not be discovered immediately. 

This may be a good reason to use a single cloud service provider for related applications, or to employ a cloud services aggregator that offers some intermediate transaction management functions.

Consistency means that a database remains in a consistent state before and after the transaction. If a data field (say, Social Security Number) requires nine digits, the database must contain only nine-digit numbers in that field before and after the transaction. If other data records refer to that field, it cannot be deleted without deleting those records or taking some other action that maintains the consistency of the database schema.

Using separate vendors in the cloud to manage different but related databases may result in inconsistencies, particularly where one data record necessarily refers to another.

Isolation is the principle that one operation in a database system should not affect others until the transaction is complete, so that one function is not confused by an intermediate step in another function. This is why database management systems use scheduling algorithms to isolate functions and process them in the proper sequence.

Cloud computing should not threaten this principle so long as a complete transaction is processed on the same device or array, or at least subject to the same scheduling algorithm.

Durability means that the transaction record will persist once it is successfully created and the user is so notified. A common way of ensuring this result is to create a transactions log, which allows the database manager to return the database to a pre-failure state.

A cloud solution should similarly offer the capability of logging user transactions, even if the transaction data are then sent on to different locations or vendors.

Cloud computing is new enough that not all vendors have satisfactorily incorporated these data integrity principles in their solutions. Moreover, customers sometimes use such a variety of service providers that no single one of them takes responsibility for ensuring data integrity at the level of data entry and transaction management.

Over time, more cloud service providers may refer to developing standards such as the SNIA Cloud Data Management Interface (CDMI) specification and other SNIA cloud storage standards, the Data Integrity Field (DIF) standard (which, among other things, verifies input-output addresses to avoid misplacing data entered in the cloud), WS-Reliability (an OASIS standard for reliable message delivery in web services) and WS-Transaction (OASIS protocols for coordinating distributed applications), as well as XML-based solutions that add some transaction management functionality to web applications. As these standards and solutions mature, it may be appropriate to make them contractual.

These approaches would help the customer feel more confident that good data gets into cloud databases, stays there, and comes out of the cloud in the same shape.

The lawyer in me recognizes that this sort of confidence must also be communicated to government agencies, courts, and juries. Records processed and stored in the cloud may become evidence, and the strength of evidence depends largely on its reliability. I was involved once in litigation over a web marketing campaign, where the website transaction logs were so badly maintained and so insecure that it was nearly impossible to ascertain what the customer really owed the marketing company under the contract.

Reliable business records are necessary to collect a bill, prove an obligation, comply with government requirements, or establish a sequence of disputed events. If there are serious questions about data integrity in the systems routinely used by the business, the company may find its position badly undermined.

Once litigation is launched or threatened, the cloud customer will need to put a “litigation hold” on relevant data, even if it is in the hands of an outsourced service provider, and the customer will typically need the service provider’s assistance in locating and producing electronically stored information (ESI) stored outside the party’s premises. (See my colleague Tanya Forsheit’s recent discussion on preserving and retrieving ESI in the cloud.) But the service provider may be needed not only to help find, preserve, and deliver relevant records as ESI but also to establish their bona fides.

Parties presenting claims or defenses in court have long relied on the “business records exception” to the hearsay rule, a descendant of the English common law “shop book rule,” to present records of transactions in court.  The principle is reflected in Rule 803(6) of the US Federal Rules of Evidence and in similar state provisions. Rather than bringing witnesses into court to testify from direct experience and memory about every material aspect of a disputed transaction, a party can produce the records that it routinely keeps in its business, and these are presumed to be reliable. The presumption can be rebutted with evidence to the contrary -- which conceivably could occur in the case of a badly executed cloud computing strategy with poor assurances of data integrity.

Business records are not entirely self-authenticating, and there are sometimes disputes over their source and custody, or whether they have been altered. Typically, business records must be introduced from the party’s custody along with testimony that identifies the records and authenticates them as records regularly made and kept in the course of the party’s business.

Thus, it could become necessary in some cases to call for testimony from an employee of the cloud services provider to authenticate data produced from an outsourced application and a shared data storage facility, and to counter any challenges about the possibility of lost or altered data. The cloud computing service provider should be able to demonstrate that its procedures for recording transactions, associating them accurately with the author, date, and time, and storing the data securely, are consistent and effective, and that they comport with industry standards or common industry practices.

These issues are not entirely new or restricted to cloud computing, of course. To take an example, email has been central to business communications for more than twenty years, and emails are often used as evidence of transactions, conduct, and intentions in a wide variety of civil and criminal legal proceedings. Yet emails are not always reliably sourced or date-stamped, and they often reside in multiple locations, including individual’s laptops, desktops, and phones, servers on the premises of the sender’s and receiver’s organizations, and the facilities of third-party service providers, ISPs, and webmail operators. In each case, the messages may or may not be backed up or archived onto other computers or storage media. As a result, investigators and lawyers are often in the position of searching out and comparing multiple instances of what appears to be the same email, and courts sometimes have to rule on the likelihood that a critical message was in fact sent by the purported author, received by the intended recipient, retained or deliberately deleted, or even altered in substance. (Celebrated cases involving Martha Stewart and the White House come to mind.) Lorraine v. Markel Am. Ins. Co., 241 F.R.D. 534 (D. Md. 2007) is an example of a court losing patience with parties that merely attached emails to court papers without authenticating testimony to establish their source and reliability.

As transactions databases and other kinds of business records follow email into the cloud, we are likely to see more disputes over records authentication and reliability. This suggests that customers should seek out cloud computing service providers that offer effective data integrity as well as security. Customers should also consider inserting a general contractual obligation for the service provider to cooperate as necessary in legal and regulatory proceedings -- because sometimes integrity must be proven.