Disclaimer: This postbode discusses general legal issues, but it does not constitute legal advice ter any respect.
Web scraping (also referred to spil crawling or spidering) is the automated process of gathering gegevens from another person’s webstek. But is this practice a fine method for mining competitor gegevens, or is it just a legal pitfall waiting to toebijten? Wij discuss below:
Simply waterput, web scraping might be one of the best ways to aggregate content from across the internet, but it comes with a caveat: It’s also one of the hardest implements to parse from a legal standpoint. For the uninitiated, web scraping is a process whereby an automated lump of software extracts gegevens from a webstek by “scraping” through the site’s many pages. While search engines like Google and Bing do a similar task when they index web pages, scraping engines take the process a step further and convert the information into a format which can be lightly transferred overheen to a database or spreadsheet.
It’s also significant to note that a web scraper is not the same spil an API. While a company might provide an API to permit other systems to interact with its gegevens, the quality and quantity of gegevens available through APIs is typically lower than what is made available through web scraping. Ter addition, web scrapers provide more up-to-date information than APIs and are much lighter to customize from a structural standpoint.
The applications of this “scraped” information are widespread. A reporter like Nate Silver might use scrapers to monitor baseball statistics and create numerical evidence for a fresh sports story he’s working on. Similarly, an eCommerce business might bulk scrape product titles, prices, and SKUs from other sites te order to further analyze them.
While web scraping is an undoubtedly powerful implement, it’s still undergoing growing aches when it comes to legal matters. Because the scraping process appropriates pre-existing content from across the web, there are all kinds of ethical and legal quandaries that confront businesses who hope to do leverage scrapers for their own processes.
Te this “wild west” environment, where the legal implications of web scraping are ter a onveranderlijk state of flux, it helps to get a foothold on where the legal needle presently falls. The following timeline outlines some of the fattest cases involving web scrapers te the United States, and permits us to achieve a greater understanding on the precedents that surround the court rulings.
For years after they very first came into use, web scrapers went largely unchallenged from a legal standpoint. Te 2000, however, the use of scrapers came under strong and consistent fire when eBay fired the very first slok against an auction gegevens aggregator called Bidder’s Edge. Te this very early case, eBay argued that Bidder’s Edge wasgoed using scrapers ter a way that violated Trespass to Chattels doctrine. While the lawsuit wasgoed lodged out of court, the judge upheld eBay’s original injunction, stating that powerful bot traffic could very well disrupt eBay’s service.
Then te 2003’s Intel Corp. v. Hamidi, the California Supreme court overturned the ondergrond of eBay v. Bidder’s Edge, ruling that Trespass to Chattels could not extend to the setting of computers if no actual harm to private property occurred.
So te terms of legal activity against web scraping, Tresspass to Chattels no longer applied, and things were back to square one. This began a period te which the courts consistently rejected Terms of Service spil a valid means of prohibiting scrapers, including cases like Ideal Ten v. Google, and Cvent v. Eventbrite.
The Takeaway: The earliest cases against scrapers hinged on Trespass to Chattels law, and were successful. However, that doctrine is no longer a valid treatment.
2009: Facebook Steps Ter
Te 2009, Facebook turned the tides of the web scraping war when Power.com, a webpagina which aggregated numerous social networks into one centralized webpagina, included Facebook ter their service. Because Power.com wasgoed scraping Facebook’s content instead of adhering to their established standards, Facebook sued Power on grounds of copyright infringement.
Te denying Power.com’s motility to dismiss the case, the Judge ruled that scraping can constitute copying, however momentary that copying may be. And because Facebook’s Terms of Service don’t permit for scraping, that act of copying constituted an infringement on Facebook’s copyright. With this decision, the waters regarding the legality of web scrapers began to shift ter favor of the content creators.
The Takeaway: Even if a web scraper overlooks infringing content on its way to freely-usable content, it might qualify spil copyright infringement by virtue of having technically “copied” the infringing content very first.
2011-2014: U.S. v Auernheimer
Ter 2010, hacker Andrew “Weev” Auernheimer found a security flaw ter AT&,T’s webstek, which would display the email addresses of users who visited the webpagina via their iPads. By exploiting the flaw using some plain scripts and a scraper, Auernheimer wasgoed able to gather thousands of emails from the AT&,T webpagina.
Albeit thesis email addresses were publicly available, Auernheimer’s exploit led to his 2012 conviction, where he wasgoed charged with identity fraud and conspiracy to access a laptop without authorization.
Earlier this year, the court vacated Auernheimer’s conviction, ruling that the trial’s Fresh Jersey venue wasgoed improper. But even tho’ the case turned out to be mostly inconclusive, the court noted the fact that there wasgoed no evidence to showcase that “any password gate or code-based barrier wasgoed breached.” This seems to leave slagroom for the web scraping of publicly-available private information, albeit it’s still very much open to interpretation and not set te stone.
The Takeaway: Using a web scraper to aggregate sensitive private information can lead to a conviction, even if that information wasgoed technically available to the public. While there is hope ter the court’s observation that no passwords or barriers were violated to retrieve this information, the waters here are still very volatile.
2013: Associated Press vs. Meltwater
Meltwater is a software company whose “Global Media Monitoring” product uses scrapers to aggregate news stories for paying clients. The Associated Press took punt with Meltwater’s scraping of their original stories, some of which had bot copyrighted. Te 2012, AP filed suit against Meltwater for copy infringement and hot news misappropriation.
While it’s already bot established that facts cannot be copyrighted, the court determined that the AP’s copyrighted articles—and more specifically, the way te which the facts within those articles were arranged—were not fair spel for copying. On top of this, Meltwater’s use of the articles failed to meet the established fair use standards, and could not be defended on that gevelbreedte either.
The Takeaway: Fair use is limited when it comes to web scrapers, and copyrighted content is not always open to be scraped.
2014: QVC vs. Resultly
Ter 2014 QVC (the well known TV retailer) and Resultly (a startup shopping app) got into a legal battle overheen what QVC termed spil Resultly’s “excessive crawling” of their webpagina. QVC’s complaint further alleged that that Resultly disguised its web crawlers to mask its source IP address, which prevented QVC IT personnel from quickly blocking the unwanted crawlers. And while Resultly’s automated crawlers were aggressive enough to overcharge QVC’s servers—causing outages that cost QVC around $2M te revenue—the courts ruled that Resultly didn’t act to cause intentional harm to QVC’s webpagina. On the contrary the court had this to say:
“Resultly wasgoed not QVC’s competitor, a disgruntled QVC employee, or an unhappy QVC customer aiming to cause harm to QVC’s server. To the contrary, Resultly’s purpose wasgoed to grow a loyal user base of people who build up something from being directed to QVC’s webstek.”
The Takeaway: you should always be maintaining your webpagina against crawlers, not only because this is a no-brainer IT practice, but because the legalities concerning web scraping are still so murky that businesses can’t reasonably expect to be bailed out by the courts.
By closely observing the outcomes of previous rulings, you’ll find that there are a few guidelines that a scraper should attempt to adhere to:
- Content being scraped is not copyright protected
- The act of scraping does not cargo the services of the webpagina being scraped
- The scraper does not gather sensitive user information
- The scraped content adheres to fair use standards
- Always ALWAYS be doing your own due diligence to block against scraping is this is activity unwanted on your site–the courts’ decisions are to volatile to expect a favorable outcome
While all of thesis guidelines are significant to understand before using scrapers, there are other ways to acclimate to the legal nuances. Ter many cases, you’ll find that a elementary conversation with a business software developer or consultant will lead to some satisfying conclusions: Odds are, they’ve used scrapers te the past and can shed light on any snags they’ve kasstuk te the process. And of course, talking with a lawyer is always an ideal course of act when treading into questionable legal territory.