People — and by people I mean non-journalists here, normals — have some pretty wild misconceptions about how reporters and editors do their work. To take one example: Only 58 percent think that, when reporters use anonymous sources, they actually know who the source is; most of the rest think the journalists themselves have no idea who they’re talking to. Around a quarter of Americans surveyed aren’t aware of the difference between a reporter and a columnist, or between a news story and a press release.
But one thing that both journalists and their audiences tend to agree on is that there’d be more trust in the whole process if news organizations let readers know more about the sourcing and evidence they cite in stories. (About two-thirds of both groups agree.)
It’s in that spirit that The New York Times’ R&D group has launched The News Provenance Project, which aims to establish “a set of signals that can travel with published media anywhere that material is displayed: on social media, in group chats, in search results and emails, and so on” in order that an end user could verify its, well, provenance. Here’s Sasha Koren:
In a time of heightened political polarization and widespread social media use, the prevalence of misinformation online is a persistent problem, with increasingly serious effects on elections and the stability of governments around the world. In addition to false statements published as fact in text and photos that have been manipulated or republished out of context, instances of manipulated video are now on the rise. How should news organizations respond to this crisis?
Our first project is focused on photojournalism. Because photos can be easily manipulated — and then circulate widely through digital spaces with few brakes applied from social platforms, messaging apps or search engines — we are aiming to learn what happens when we give audiences better insight about the information associated with a news photo published online.
To that end, we are approaching this task with a hypothesis: that adding context to images might have a positive or clarifying effect on the wide ecosystem of information published to the web.
Around that hypothesis, we are conducting user research, which we’ll use as the basis for a proof of concept. We’ll test the effectiveness of that proof of concept to find out whether access to that information helps audiences better understand the veracity of professionally produced photojournalism. Some examples of what we hope to learn:
- Could information about a photo’s digital history help people better understand the way it is produced and published?
- How much information might be helpful or necessary in sourcing a photo shared outside of its published context?
- What kinds of metadata — for example, the time and place the photo was captured, the original publisher and caption, the photo’s revision history— might be important to include or prioritize?
- How helpful might a symbol or watermark be in establishing credibility?
- How might access to photo metadata change how audiences perceive photos that don’t have metadata?
So when you see a photo on Facebook or Twitter that claims to be from The New York Times and seems to show some sort of outrageous activity — say, a politician engaged in some sort of illegal or improper behavior — a tool made by the News Provenance Project might be able to let you check that the photo really is a Times photo. It might let you look at metadata — either created through software at the time a photo was taken or added by a photo editor down the line — that tells you about its origins. Maybe it could even record the path the image took in the editing process — any brightening, sharpening, cropping, or other Photoshoppy changes.
So how are they gonna do this? I’ll give you a hint: It starts with “b” and ends in “lockchain.”
All hype aside, blockchain offers mechanisms for sharing information between entities in ways we think are essential for establishing and maintaining provenance of digital files. Specifically, data is stored to a blockchain is immutable (read: tamper-proof), and copies of the database can be held by multiple parties.
Why blockchain? Its underlying structure as a “distributed ledger” (a database that is not housed on one set of servers owned and operated by one entity, but by many entities and servers that are kept updated simultaneously) is useful for this project because it makes the records of each change traceable: files are not so much changed as built upon. Any updates to what is published are recorded in a sequential string (or “blocks” in a “chain”) with the string of those changes adding up to create a provenance…
By experimenting with publishing photos on a blockchain, we might in theory provide audiences with a way to determine the source of a photo, or whether it had been edited after it was published.
The Times is being clear about wanting to create something that will be of use to the entire industry: “A successful implementation will require collaboration and use among many organizations. To that end, we’ll make what we learn publicly available in the hopes that it may be of interest and of use to other publishers.” They aim to have a proof of concept to share by the end of the year.
This is a good and noble thing the Times is doing; tracing provenance and creating audit trails have been core promises of blockchain technology for years. A startup named Mediachain Labs aimed to create something similar for all forms of digital content; it was acquired in 2017 by Spotify, which has an obvious interest in being able to track from where a particular music file originated.
That said…I do want to make one thing clear, because I’ve seen it come up in sloppy reporting around another blockchain journalism project, Civil. The blockchain does not prove the “accuracy” of anything. It does not “verify” the truth of anything. It is not true that, with blockchain “there would never be fake news.”
If a Civil newsroom’s story is hashed on the blockchain, what that can prove is that (a) it really was published by that newsroom at a certain time, and (b) that the version of the story you’re looking and the original version either are or are not identical. That’s it. It can’t verify that story’s facts are accurate, that the reporter operated without bias, that the people he quoted weren’t figments of his imagination, or anything else that we associate with “fake news.”
For example, Forbes announced last fall that it would start metadata about its articles on blockchain. “Publishing the metadata will indelibly establish the author’s identity and credibility, as well as the expert nature of participating sources,” a Forbes press release said. But of course recording something on the blockchain does nothing to prove its “credibility” or the “expert nature of participating sources,” delibly or indelibly. It only proves Forbes published it — and as we’ve been reminded recently, that is hardly an ironclad guarantee of accuracy.
The same is true of the News Provenance Project, which, laudably, doesn’t use the dread phrase “fake news” on its site or in its introductory post. Remember its narrow and realistic stated hope: We might in theory provide audiences with a way to determine the source of a photo, or whether it had been edited after it was published.
So if the News Provenance Project is successful — if it gets cooperation from the platforms through which these images are distributed, if it can solve problems around resizing, metadata editing tools, and other things that can break the link between the image and the information about it — and a faked photo on 4chan claims to show Pete Buttigieg and Kamala Harris in flagrante delicto and the poster says it ran in the Times, someone will be able to check and find out that no, it didn’t.
That’s cool, and probably worthwhile for a host of reasons (licensing, syndication, internal usage records) that have nothing to do with battling misinformation. But it’s also a vanishingly small share of what we think of as “fake news.”
People pushing Facebook memes about Seth Rich weren’t resting their proof on the fact that The Washington Post published a particular photo on a particular day. When the Denver Guardian published “FBI AGENT SUSPECTED IN HILLARY EMAIL LEAKS FOUND DEAD IN APPARENT MURDER-SUICIDE,” the people fooled weren’t mistaken about whether it was really published by the Denver Guardian — they were under the mistaken impression that the Denver Guardian was a real news site (it was not). And of course the vast, vast majority of fake images online don’t claim provenance from an elite media company in the first place.
So whenever you see a headline that seems to say blockchain is going to “end” fake news, “fix” fake news, “solve” fake news, “stop” fake news, “prevent fake news from spreading,” be “the only real solution to fake news,” or even “rein in the new post-truth world” — remember that the claim being made by people who understand the technology is much, much narrower than that.
And anyway, maybe blockchain should focus on solving the Israeli-Palestinian conflict first.