THOUGHTS OVER MY MORNING COFFEE : Google keeps messing with search, now hiding away an important feature

Google appears to hide away an important feature from its search engine—an easily accessible cache of search results. It’s still there, if you know where to look.

1 February 2024 — Last week I wrote a post “Why stuff does not work“, lamenting the fact that modern technology lacks the zing of the past, that it has gone off the rails. The main reason? Well, as in so many things (too many) things: money.

I used search as one example.

I noted two things are important in this comparison of the “old” tech and the “new” tech deployed by the estimable Google outfit. 

Number one: Search in Google’s early days made an attempt to provide content relevant to the query. The system was reasonably good, but it was not perfect. Messrs. Brin and Page fancy danced around issues like disambiguation, date and time data, date and time of crawl, and forward and rearward truncation. If you have read any of the white papers that explain the anatomy, architecture and history of Google’s search engine you know all this stuff. Many of my readers in my contingent of eDiscovery industry subscribers used this architecture in building their search templates .

Flash forward to the present day, and the massive shift by Prabhakar Raghavan and the other Googlers “in charge of search” who now deliver irrelevant information. Search has been reduced to “what is our top revenue driver?” To find useful material, you need to navigate to a Google Dorks service and use those tips and tricks. Otherwise, forget it and give Swisscows.comStartPage.com, or Yandex.com a whirl. 

You are correct. I don’t use the usual “smart” Web search engines. I am a dinobaby, and I don’t want thresholds set by a 20-year-old filtering information for my commercial benefit. Ok, his employer’s commercial benefit. Thanks, but no thanks.

Number two: search today is a monopoly. It takes specialized expertise to find useful, actionable, and accurate information. Most people – even those with law degrees, MBAs, a tech background (or the ability to copy and paste code) cannot cope with provenance, verification, validation, and informed filtering performed by a subject matter expert. Bullshit does not work in my corner of the world. Bullshit does not serve my team. “Good enough” does not work for me.

As Cory Doctorow wrote, we are witnessing the “enshittification” of platforms. We’re living through the “enshittocene” of platforms. The internet was colonized by platforms, and all these platforms simply degraded quickly and thoroughly in a three stage process: 

  • First, platforms are good to their users
  • Then they abuse their users to make things better for their business customers
  • Finally, they abuse those business customers to claw back all the value for themselves. Then, they die.

And Google continues to degrade.

I have to imagine that Google did not make a lot of money from people pinging its search engine for cached website results, but making it convenient to access was a service to researchers and journalists.

It was also somewhat of a service to society. Often, when information-related scandals broke—such as content with egregious errors, evidence of deleted social media statements, or information at risk of appearing offline in short order—it was a great backstop that worked more effectively than the Internet Archive for capturing fresh information.

And yet, for some reason, Google has treated this feature like it was embarrassed of it. Over the years, it has increasingly come to bury the feature in its search interface, making it harder and harder to find, despite me finding it just as useful as it was the day it launched.

Recently, the company started removing it entirely, something uncovered by the SEO sphere’s closest thing to an investigative journalist, Barry Schwartz of Search Engine Roundtable. As he writes:

After a couple of months of testing, it seems Google has now removed the cache link from the search results page. I no longer see a link to the Google cache within search result snippets, but that doesn’t mean you cannot access [the] cache, you can. Now when you click the three dots for more information for a search result snippet, the cache button is missing.

To be clear, the cache is not gone—it is simply hidden from public view. (I don’t see it on my end, either.) You can access it manually by typing in a specialized URL, along the lines of:

https://webcache.googleusercontent.com/search?q=cache:gregorybufithis.com

But by choosing not to present it to people as they’re searching, you can’t point your cursor in that general direction. Defaults remain essential to how we experience computers, and we fail to do enough to ensure they stay put.The removal of this feature represents multiple things: First, a changing, more dynamic internet in which more content is built with the requirement of JavaScript, something that is largely incompatible with the caching feature.

And second, a decision built with the bottom line in mind – after all, if you click a website accessed through Google’s cache, you don’t pull up the ads from Google AdSense that generally appear when you look at the normal version of the site. So no “$ka-ching$!!” for Google.

Now, to be clear, Google is still caching pages all over the internet. After all, caching is a key element of the AMP initiative that has shaped much of how people use its search engine over the past near-decade. Google just doesn’t want you to use it as an alternative to seeing the actual search result.

As I have noted before, Google’s results are actually getting worse in an era of generative AI, and this is just more evidence.

And as a journalist, I need to think and work differently when it comes to search. For instance, I am never dealing with static databases but dynamic databases so existing database management programs or eDiscovery software does not work. It is beyond their capability. You just can’t just go at it with classic tools. There’s nothing in the market for journalists that can ingest so much data. You need machine learning toolkits in Python, Neo4j, Linkurious, Fonduer and Scikit-learn softwares that allow you to do deep-dives and serious search.

On platforms, if I’m trying to uncover something or follow a thread that seems to be falling apart by the second, I don’t want a cache tool like Google’s taken away from me.

But in the last year, we have increasingly seen these tools degraded and damaged by large companies that have decided they’re no longer in their best interest. For example, Meta’s Threads has discouraged chronological search, while the site I call Twitter has made it harder to find links on external platforms. I would often search for YouTube videos on Twitter to understand their virality but I can’t do that anymore.

There are valid arguments against making such tools easily accessible. After all, if abuse or brigading is a major factor on a given platform, as Meta has long had to deal with on Instagram, having immediate access to recent comments opens up bullying opportunities.

And maybe having those cached sites so easily accessible invited some kind of legal pressure that Google didn’t want to deal with anymore.

But this is just another indication we can’t trust valuable features in the hands of large companies anymore. They too often shape and frame the way we access and manage data – and control the entire “Infoplex”.

And when they feel like taking it away, that’s what they’ll do.