In an AI world, data is the new gold. Without immense troves of genuine and factual data to train AI models on, the AI tools being used today would be completely useless.
OpenAI—the company behind ChatGPT—has repeatedly caught flak for where it sourced the ChatGPT training data from. The company allegedly resorted to copyright infringement, possibly utilizing protected material from well-known authors, The New York Times, and millions of YouTube videos.
Unfortunately, AI companies are running out of data. This lack of data means that companies are willing to (a) pay immense sums for any company providing original user data and (b) change their terms to give them greater access to any user data already stored on their servers—including unredacted PDFs.
The data gold rush
One of the best examples to illustrate how valuable data has become was when Reddit went public.
When a company begins trading on the public stock market, it goes through a process called an “IPO”—an Initial Public Offering—which is the first offering of shares to the public. The process is quite involved and includes a valuation by underwriters to determine the company’s expected initial share price.
Reddit’s IPO valuation meant that its shares would open at $34. However, the company had been pitching to investors that they’d be offering their immense collection of user data for AI training, which pushed the shares up by 48%, giving the company a staggering total value of $748 million. Two months later, Reddit’s shares hit a record-high of $74.90 after it inked a deal with OpenAI to let ChatGPT access the company’s data.
Here are some more examples to demonstrate how valuable companies consider data at the moment:
Meta—the company that owns Facebook, Instagram, and WhatsApp—recently announced that it would train its own AI models on publicly available user data, and provided an extremely complex opt-out procedure for users who didn’t agree with it.
Adobe made waves recently when ambiguous language in its terms suggested that it might start using users’ private creations to train its AI models. The company eventually backtracked and overhauled its terms in response to user backlash.
The list goes on, with companies from Twitter to Instacart to Zoom all updating their terms to suggest that they’re going to start mining user data for AI training. Even the Federal Trade Commission (FTC)—the US consumer protection agency—made a public comment about it, warning companies that quietly altering their terms in this manner might be a violation of consumer rights.
Unfortunately, as with everything on the internet, once it’s out there you can’t take it back. That’s bad enough for private data but can be catastrophic for confidential data in PDFs, such as business agreements, court documents, NDAs, and other high-value data that tech companies are hungry for.
Examples of poorly redacted PDF blunders
Worse than unredacted PDFs are poorly redacted PDFs because they bring a false sense of security.
Improperly redacted PDFs were already a problem even before AI. In one study, researchers found thousands of supposedly redacted PDFs on the internet that hadn’t been redacted properly, exposing sensitive details, including people’s names. A Wired article on the topic found two popular online redaction tools to be hopelessly inadequate—users could simply copy and paste the redacted text to see it.
If a user can copy and paste, that means the text is fully accessible digitally as well.
In 2014, The New York Times published a poorly redacted PDF that exposed the name of an NSA agent. In 2016, the media discovered a plaintiff’s name because of a poorly redacted court document. In 2020, a poorly redacted PDF exposed data of a national security nature in Canada. In 2021, the EU published a poorly redacted contract for its deal with pharmaceutical company AstraZeneca that revealed confidential elements of the contentious deal.
The problem is worse now with AI because, in an AI context, such errors aren’t once-off issues. Once the AI model has ingested the data, it can potentially use it in response to all future prompts, significantly increasing the risk for whoever the data relates to.
Ainon does PDF redaction properly—and affordably
Until now, the only company that has offered somewhat reliable PDF redaction has been Adobe, the original creator of the PDF standard.
Three problems exist when using Adobe for PDF redaction:
- The price for their PDF redaction tool is exorbitant, putting it out of reach of many users.
- The recent Adobe scandal with its ambiguous terms begs the question: Can we trust Adobe itself to not use unredacted PDFs for AI training?
- Their redaction features are limited compared to Ainon’s features.
We created the sophisticated Ainon PDF redaction tool to address multiple inadequacies in current PDF redaction tools—and we did it at a price that everyone can afford. Ainon offers both a pay-as-you-go and a subscription option, providing you with top-of-class redaction features that guarantee complete confidentiality for redacted zones of your document.
Some of Ainon’s key features are:
- Text selection: Ainon lets users quickly selected multiple instances of the same word or groups of text to quickly redact in one click. Ainon's Smart Redact tool automatically finds potentially sensitive data in any document, which users can also redact in a single click.
- Total redaction: All text is completely redacted, not only “covered” by a black box. Ainon stores the original PDF in a separate .ainon file that you can edit later, but the redacted version will have no confidential data in it whatsoever.
- Image blurring: We use the same “total redaction” technology to redact images by blurring them completely. Again—it’s impossible to recognize the underlying image because of the cutting-edge technology we’re using.
- Redacting handwriting: Our advanced AI detects words stored as images, including handwritten words. You can also drag a box around any element to redact it.
- Text replacement: You can replace any text easily with Ainon (for example, globally replacing a name with “[redacted]”).
- Text translation: As a bonus feature, you can use Ainon to translate PDF documents to send to people who don’t speak the language used in the original PDF document.
We’re incredibly familiar with the pitfalls of using poor redaction software, which was why we originally created Ainon. Now, with AI gobbling up data at every turn, the urgency of properly functioning PDF redaction software has never been greater. You can get that software with Ainon.
To try Ainon for free, sign up for an account here.