Text-only view of web pages

I have added another tool for my site: View Text-only version of a web-page!

Please note- it may take a couple minutes, until a request is processed! I’ll get back later why.

It is very handy if you want to see the semantic structure of a page, or get a glimpse how for example search engines would (roughly) see your page – personally I think a page should look perfectly structured without scripts, css, etc – then modify the display. Accessibility and SEO are probably the two most important reasons for this.

Previously I used Google Cache – text-only version, but nowadays it seems that Google manual penalties are a bit more frequent, resulting in total de-indexing of sites and as such my previous method didn’t work. So I created this little tool – it is nothing fancy, actually pretty much in test-version (if you look at the result-page url), I’d say it is even a bit slow, but does the job, so its available.
The source code – hmmm – not for the moment. If you want it, contact me, no problem, but it is just not tidy enough to be shared publicly – so as and when I have time, I’m still working on it.

In a nutshell:

  1. You can set the basic stream context variables on the page (referer, user-agent), http protocol is set to version 1.1
  2. It loads the html as DOMdocument
  3. Strips the following tags: link, style, img, script, iframe, input – I guess I should have also stripped frames, but they are not supported in html5 anyway, so foolish to use them.
  4. Returns the modified html document

So – nothing fancy, but does the job – if you need any other features or improvements, contact me, and as soon as I have a reasonably tidy and maybe a bit quicker source-code I’ll share it here!

All the best,

Balazs