Wednesday, April 26, 2017

Using Web Sites Programmatically with PhantomJS

It's more difficult to tell with the modern Internet audience but the general function of the Internet has been the retrieval of information.  The standard mechanism has been to use some type of browser and that goes back to the original MOSAIC.  An alternative to that is to skip the browser step to retrieve the Web site without personally visiting it and you can accomplish that with PhantomJS.  As the name implies, it's JavaScript.  (Scotch:  What is PhantomJS and How is it Used?)

Note:  Dr Rick Prairie introduced me to MOSAIC way, way back in Cincinnati, some time in early 90s.  He was one of the most jammin' PhDs you could ever meet and he saw the potential of it instantly.


PhantomJS is a headless WebKit scriptable with a JavaScript API. It has fast and native support for various web standards: DOM handling, CSS selector, JSON, Canvas, and SVG.

The above definition may be ambiguous, in simple terms, PhantomJS is a web browser without a graphical user interface.

- Scotch

In more simple terms, that grammar was bloody rubbish but you see their point.


One of the biggest problems with Web pages is they're so cluttered with crap the actual content can be overwhelmed by it.  PhantomJS offers a programmatic method of defeating that in retrieving whatever you need from the page while ignoring the rest of it.

After you have obtained that information, you can do whatever you like with it but the first move is to lose that 'headless browser' idea since PhantomJS is not a browser but an I/O method which may be convenient for your purposes.  All I/O is performed within some executable PhantomJS driver which you run on your machine.


For example, one application for such code would be in building an index for all the articles in Ithaka and there are thousands of them.  Google does not provide any kind of an index so the next move is to make it yourself, assuming someone has not created one already but I haven't seen one.

There are limitless applications for such code and PhantomJS play a part in your game.  I have not as yet used it but there does seem to be some cool potential in it.

No comments: