This time I wrote something different than I usually write. In this blog, we will discuss web scraping by learning caserjs and phantom buster so, what are we waiting for let us explore.
What if I say by running a script you can browse and interact with websites. Why stop there, not only just interaction, you can also gather data from websites without even opening the browser. Surprised ? Yeah, it is possible. The concept is called “Web Scaping”. There are various ways to do the web scraping but I am going to show you the easiest one.
CasperJS allows you to build full navigation scenarios using high-level functions and a straight forward interface to accomplish all sizes of tasks. Casperjs makes web scraping really easy and effective.
Phantombuster let you execute code in the cloud to emulate what a human would do in a browser. Easily collect, move and process data on the web. Phantombuster provides easy user interface to write and execute the script. It also supports Nick(Phantom Buster’s custom navigation module), Nodejs and Phantomjs. Phantombuster also provides agents, cloud storage, and many more features.
Now let’s start our demo in which we will Search “BBC News” in google,Click on the first link and finally fetch the latest feeds.
Step 1: Login to phantombuster
First of all go to the Phantombuster and register yourself. It is easy and free. After doing that you will able to see your dashboard. It would look like image below.
Step 2: Create a new script
Now go to the script and create a new script any name. I have named it “newsFeeds”. The bot type should be casperjs.
Step 3: Time to write some code
Copy the below code and paste it to your script file.
‘ use strict’;‘ phantombuster command: casperjs’; //’phantombuster dependencies: ‘ ‘ phantombuster package: 2’; var casper = require(‘casper’).create({ colorizerType: ‘Dummy’, pageSettings: { userAgent: ‘Mozilla / 5.0(X11; Linux x86_64; rv: 40.0) Gecko / 20100101 Firefox / 40.0’ }, viewportSize: { width: 1280, height: 1024 } }); var buster = require(‘phantombuster’).create(casper); // ———————— >8 ———————— var exitWithError = function(err) { console.log(‘Error: ‘+err); casper.exit(1); }; casper.start(‘http: //www.google.com/’, function() { console.log(“Page loaded”); console.log(“Searching in google for BBC News”); this.sendKeys(‘input[name = q]’, ‘BBC News’); }); casper.waitForSelector(‘#rso > div: nth - child(1) > div > div > h3’, function() { console.log(“Google result found”); this.click(‘#rso > div: nth - child(1) > div > div > h3 > a’); }, 10000); casper.wait(5000, function() { console.log(“Navigating to http: //www.bbc.com/”); }); casper.then(function() { console.log(‘Fetching latest feeds’); var title = this.fetchText(‘h3[class = ”gs - c - promo - heading__title gel - pica - bold nw - o - link - split__text”]’); console.log(title); }); casper.waitUntilVisible(‘span’); casper.then(function() { casper.capture(‘screenshot.jpg’); }); casper.then(function() { buster.save(‘screenshot.jpg’, function(err, url) { if (err) exitWithError(err); console.log(‘Screenshot saved: ‘+url); buster.setResultObject({ screenshotUrl: url }, function(err) { if (err) exitWithError(err); }); }); }); casper.run(function() { console.log(‘All navigation steps executed’); casper.exit(); });
Step 4: Execute the script
Click on the launch button to execute the script. Now you will get the latest feeds from BBC News.
I hope this content may be useful for you. If you have any queries or suggestions feel free to ask I would be more than happy to answer you.
Great
Outstanding
Superb…
Useful Topic