CAPTCHA. DMCA GOTCHA?
As of late there has been a great deal of news and discussion concerning “web scraping.” Web scraping is the practice of using computer software to extract information from a website. In short, a wealth of information exists on the Internet and companies of all stripes are interested in collecting it from websites, compiling and combining it, and using it to further their business. There are even third party companies that will scrape websites on behalf of other companies.
Scraping raises a multitude of legal issues, including issues related to privacy and security, intellectual property, and laws concerning unauthorized access to computers and trespass to chattels (in fact, the overlapping issues raised by scraping represent a very good example of what we call “information law”). As such, a website being scraped may disapprove of such activity and may pursue legal action against companies that engage in scraping. Many companies would rather avoid lawsuits and attempt to stop scraping from occurring in the first instance. This can be achieved by implementing technologies such as CAPTCHA (which are becoming ubiquitous) that are intended to ensure that a human is entering the website rather than a computer software program or bot. If technologies like CAPTCHA are evaded by scrapers, some websites owners might pursue an action under the anti-circumvention provisions of the Digital Millennium Copyright Act (the “DMCA”). The DMCA provides for potential statutory penalties and even criminal sanctions for violations of its anti-circumvention provisions. This post explores how the DMCA might be used in this context and looks at some cases addressing whether circumvention of CAPTCHA (and similar protocols) might result in violation of, and liability under, the DMCA.
One method for preventing scraping software from being able to access information on a website is to use a challenge response test – a family of protocols in which one party presents a question ("challenge") and another party must provide a valid answer ("response") to be authenticated. CAPTCHA is one such protocol (it stands for “"Completely Automated Public Turing test to tell Computers and Humans Apart."). In short, when a person or computer program attempts to log into a website, the website will ask for login credentials as well as requiring the person or computer to complete a CAPTCHA test. Typically the CAPTCHA requires the person or computer to re-type a series of letters, symbols and/or numbers that are printed in barely legible font. The theory being that a computer program would not be able to discern the text, while a human could (even if it takes multiple attempts, and even if the person is required to listen to audio of the text read aloud in order to understand it). The end result would be humans in, computers out. Of course those that desire to get into these websites using computer programs might be able to design such programs in a manner that evades or defeats the CAPTCHA protocol. This type of activity has actually resulted in a couple lawsuits alleging DMCA violations (among others).
DMCA Anti-Circumvention Provisions
The DMCA anti-circumvention provisions prohibit persons and entities from circumventing the technological measures that effectively control access to a copyrighted work (in this case the copyrighted work on a website). Under the DMCA, “circumvent a technological measure” is defined as efforts to “descramble a scrambled work, to decrypt an encrypted work, or otherwise to avoid, bypass, remove, deactivate, or impair a technological measure, without the authority of the copyright owner.” A technological measure “effectively controls access to a work” if the measure, “in the ordinary course of its operation, requires the application of information, or a process or a treatment, with the authority of the copyright owner, to gain access to the work.” The DMCA provides a private right of action for actual damages, as well as statutory damages in the sum of not less than $200 or more than $2,500 per act of circumvention, device, product, component, offer, or performance of service, as the court considers just. In addition, a willful violation of these provisions for purposes of commercial advantage or private financial gain could result in criminal penalties ($500,000 to $1,000,000 per offense) and jail time (up to ten years).
There are two main cases that look at this issue, the most recent of which was decided in March 2010 (see Craigslist, Inc. v. Naturemarket, Inc., 694 F. Supp. 2d 1039 (N.D. Cal. 2010); Ticketmaster L.L.C. v. RMG Technologies, Inc., 507 F. Supp. 2d 1096 (C.D. Cal. 2007)).
In the Ticketmaster case, Ticketmaster sought a premlinary injunction against RMG, and one of the causes of action alleged was a violation of the DMCA’s anti-circumvention provisions. RMG allegedly had developed a software program that allowed its customers to evade Ticketmaster’s CAPTCHA system in order to allow for the automated mass purchase of tickets. In granting Ticketmaster’s preliminary injunction, the court considered whether CAPTCHA constituted a “technological measure” (a term not defined under the DMCA):
First, the Court notes that the DMCA does not equate its use of the term "technological measure" with Defendant's terms "system" or "program." In any case, Plaintiff has submitted evidence that CAPTCHA is a technological measure that regulates access to a copyrighted work. Although the DMCA does not appear to include a definition of the term, it states that "a technological measure `effectively controls access to a work' if the measure, in the ordinary course of its operation, requires the application of information, or a process or a treatment, with the authority of the copyright owner, to gain access to the work." When the user makes a ticket request on ticketmaster.com, CAPTCHA presents "a box with stylized random characters partially obscured behind hash marks." The user is required to type the characters into an entry on the screen in order to proceed with the request." Most automated devices cannot decipher and type the random characters and thus cannot proceed to the copyrighted ticket purchase pages. Thus, because CAPTCHA "in the ordinary course of its operation, requires the application of information . . . to gain access to the work," it is a technological measure that regulates access to a copyrighted work. Plaintiff is therefore likely to prevail on its DMCA § 1201(a)(2) claim.
The fact pattern in the Craigslist case was similar to Ticketmaster (and indeed relied in part on the reasoning in Ticketmaster). This case, however, came up in the context of a default judgment so its precedential value may be limited. Nonetheless, the court did look at whether Craigslist stated a proper DMCA anti-circumvention claim related to evasion of the CAPTCHA process used by Craigslilst. In this case the defendants provided their clients with a software service known as "CraigsList AutoPoster Professional" which included an automatic CAPTCHA bypass feature that allowed the defendant and its customers to circumvent Craigslist’s CAPCHA security measures. In holding that Craigslist stated a valid cause of action under the DMCA, the court indicated the following:
Plaintiff owns valid copyrights in its website and the content within. This content is protected by Plaintiffs CAPTCHA software and telephone verification, both of which were circumvented by Defendants. Plaintiff has alleged that Defendants' AutoPoster Professional software, pre-verified craigslist accounts, and CAPTCHA credits each circumvent these security measures and provide unauthorized access to Plaintiffs copyrighted material. Defendants' products and services were designed primarily for the purpose of circumventing Plaintiffs CAPTCHA and telephone verification measures. Defendants thus enabled unauthorized access to and copies of copyright-protected portions of Plaintiffs website controlled by these measures—particularly the ad posting and account creation portions of the website. As such, Defendants' manufacture, marketing, and distribution of their software provided third parties unauthorized access to Plaintiffs copyrighted material. Taken together, the undersigned finds that Plaintiff has sufficiently stated a claim for violation of Section 1201(a)(2) of the DMCA. Further, because the CAPTCHA Plaintiff employs also protects Plaintiffs rights in its website—a protected work—Plaintiff has also sufficiently stated a claim under Section 1201(b)(1).
Note that both the Ticketmaster and Craigslist case were against a company creating anti-circumvention software for use by others, and do not address the direct violation that could exist for an entity actually using the software. Note also that neither decision amounts to a final judgment on the merits of whether evading CAPTCHA is a DMCA violation. Nonetheless, it does follow that if a software program that evades CAPTCHA could constitute a violation of the DMCA’s anti-trafficking provisions, it is also likely that use of that software to evade CAPTCHA could be a violation of DMCA section 1201(a) (or at least it may be a valid allegation of such a violation).
So what does this all mean for companies engaged in scraping or desiring to engage in scraping (or having somebody else do it on their behalf). Be careful, especially where the scraping requires the circumvention or evasion of technological measures preventing access to the website’s copyrighted works. While we are still far from answering the ultimate question as to whether evading CAPTCHA is a violation of the DMCA, the risk inherent in the DMCA per violation statutory damages could be high (not mention the risk of criminal action). There is a potential multiplier effect because each circumvention of CAPTCHA could be a violation, and if this is being done automatically all the time those actions could be very numerous. Companies that are considering engaging in these activities need to look very closely at how the scraping will be done and whether technological measures need to be circumvented in order to get the data at issue. If using a third party they should inquire as to their practices in order to assess this risk (as there may be vicarious liability theories that could attach). Note, this blogpost does not even address other key issues like copyright infringement, potential computer fraud and abuse claims (e.g. under the Computer Fraud and Abuse Act), and others. Those issues so should also be analyzed and taken into account before engaging in these activities.