Web Scraping for Beginners using Casperjs and Phantombuster

This time I wrote something different than I usually write. In this blog, we will discuss web scraping by learning caserjs and phantom buster so, what are we waiting for let us explore.

What if I say by running a script you can browse and interact with websites. Why stop there, not only just interaction, you can also gather data from websites without even opening the browser. Surprised ? Yeah, it is possible. The concept is called “Web Scaping”. There are various ways to do the web scraping but I am going to show you the easiest one.

CasperJS allows you to build full navigation scenarios using high-level functions and a straight forward interface to accomplish all sizes of tasks. Casperjs makes web scraping really easy and effective.

Phantombuster let you execute code in the cloud to emulate what a human would do in a browser. Easily collect, move and process data on the web. Phantombuster provides easy user interface to write and execute the script. It also supports Nick(Phantom Buster’s custom navigation module), Nodejs and Phantomjs. Phantombuster also provides agents, cloud storage, and many more features.

Now let’s start our demo in which we will Search “BBC News” in google,Click on the first link and finally fetch the latest feeds.


Step 1: Login to phantombuster

First of all go to the Phantombuster and register yourself. It is easy and free. After doing that you will able to see your dashboard. It would look like image below.

 

Image of dashboard of phantombuster

 


Step 2: Create a new script

Now go to the script and create a new script any name. I have named it “newsFeeds”. The bot type should be casperjs.

 
Image of phantom buster

Step 3: Time to write some code

Copy the below code and paste it to your script file.

 

‘
use strict’;‘
phantombuster command: casperjs’;
//’phantombuster dependencies: ‘
‘
phantombuster package: 2’;
var casper = require(‘casper’).create({
    colorizerType: ‘Dummy’,
    pageSettings: {
        userAgent: ‘Mozilla / 5.0(X11; Linux x86_64; rv: 40.0) Gecko / 20100101 Firefox / 40.0’
    },
    viewportSize: {
        width: 1280,
        height: 1024
    }
});
var buster = require(‘phantombuster’).create(casper);
// ———————— >8 ————————

var exitWithError = function(err) {
    console.log(‘Error: ‘+err);
    casper.exit(1);
};

casper.start(‘http: //www.google.com/’, function() {
console.log(“Page loaded”); console.log(“Searching in google
    for BBC News”); this.sendKeys(‘input[name = q]’, ‘BBC News’);
});

casper.waitForSelector(‘#rso > div: nth - child(1) > div > div > h3’, function() {
    console.log(“Google result found”);
    this.click(‘#rso > div: nth - child(1) > div > div > h3 > a’);
}, 10000);

casper.wait(5000, function() {
            console.log(“Navigating to http: //www.bbc.com/”);
            });

        casper.then(function() {
            console.log(‘Fetching latest feeds’);
            var title = this.fetchText(‘h3[class = ”gs - c - promo - heading__title gel - pica - bold nw - o - link - split__text”]’);
            console.log(title);
        });

        casper.waitUntilVisible(‘span’);

        casper.then(function() {
            casper.capture(‘screenshot.jpg’);
        });

        casper.then(function() {
            buster.save(‘screenshot.jpg’, function(err, url) {
                if (err)
                    exitWithError(err);
                console.log(‘Screenshot saved: ‘+url);

                buster.setResultObject({
                    screenshotUrl: url
                }, function(err) {
                    if (err)
                        exitWithError(err); 
                });
            });
        });

        casper.run(function() {
            console.log(‘All navigation steps executed’);
            casper.exit();
        });


Step 4: Execute the script

Click on the launch button to execute the script. Now you will get the latest feeds from BBC News.

 
Image of executed script
 

I hope this content may be useful for you. If you have any queries or suggestions feel free to ask I would be more than happy to answer you.

 

4 Comments

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top