Pages

.

PhantomJS and Jsoup with Spring Boot



You'll need to have PhantomJS installed locally and on the PATH, you can accomplish this by following my Install Instructions for Mac.

Gradle Dependencies

In your Spring Boot project add the following Gradle dependencies to the build.gradle file.

repositories {
  mavenCentral()
  maven { url 'https://jitpack.io' }
  ...
}

dependencies {
  ...
  compile('org.jsoup:jsoup:1.8.3')
  compile('com.github.jarlakxen:embedphantomjs:3.0')
  compile('com.github.detro:ghostdriver:2.1.0')
  ...
}


Code

The following code will return the page. The key line is the Jsoup.parseBodyFragment(sourceHtml).

// Imports
import org.jsoup.Jsoup;
import org.jsoup.nodes.Element;

import org.openqa.selenium.WebDriver;
import org.openqa.selenium.phantomjs.PhantomJSDriver;
import org.openqa.selenium.remote.DesiredCapabilities;

// Inside the code somewhere...
DesiredCapabilities caps = new DesiredCapabilities();
caps.setJavascriptEnabled(true);
WebDriver driver = new PhantomJSDriver(caps);
driver.get(urlString);
// Incase you need to debug the sourceHtml.
String sourceHtml = driver.getPageSource();
// Use Jsoup to parse the HTML.
Element document = Jsoup.parseBodyFragment(sourceHtml);
driver.close();

Logging Message

When the code is triggered, the following log message will appear in the Spring Boor Console.
The "executable:" line will show the path to where phantomjs is installed.

INFO 1929 --- [nio-8080-exec-1] o.o.s.phantomjs.PhantomJSDriverService   : executable: /<Local Install Path>/phantomjs-2.1.1-macosx/bin/phantomjs
INFO 1929 --- [nio-8080-exec-1] o.o.s.phantomjs.PhantomJSDriverService   : port: 1112
INFO 1929 --- [nio-8080-exec-1] o.o.s.phantomjs.PhantomJSDriverService   : arguments: [--webdriver=1112, --webdriver-logfile=/<Working Directory>/phantomjsdriver.log]
INFO 1929 --- [nio-8080-exec-1] o.o.s.phantomjs.PhantomJSDriverService   : environment: {}
[INFO] GhostDriver - Main - running on port 1112
[INFO] Session [90a9fb50-7179-11e7-bf8f-09865228037e] - page.settings - {"XSSAuditingEnabled":false,"javascriptCanCloseWindows":true,"javascriptCanOpenWindows":true,"javascriptEnabled":true,"loadImages":true,"localToRemoteUrlAccessEnabled":false,"userAgent":"Mozilla/5.0 (Macintosh; Intel Mac OS X) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1 Safari/538.1","webSecurityEnabled":true}
[INFO] Session [90a9fb50-7179-11e7-bf8f-09865228037e] - page.customHeaders:  - {}
[INFO] Session [90a9fb50-7179-11e7-bf8f-09865228037e] - Session.negotiatedCapabilities - {"browserName":"phantomjs","version":"2.1.1","driverName":"ghostdriver","driverVersion":"1.2.0","platform":"mac-unknown-64bit","javascriptEnabled":true,"takesScreenshot":true,"handlesAlerts":false,"databaseEnabled":false,"locationContextEnabled":false,"applicationCacheEnabled":false,"browserConnectionEnabled":false,"cssSelectorsEnabled":true,"webStorageEnabled":false,"rotatable":false,"acceptSslCerts":false,"nativeEvents":true,"proxy":{"proxyType":"direct"}}
[INFO] SessionManagerReqHand - _postNewSessionCommand - New Session Created: 90a9fb50-7179-11e7-bf8f-09865228037e
INFO 1929 --- [ null to remote] o.o.selenium.remote.ProtocolHandshake    : Detected dialect: OSS
reade more... Résuméabuiyad

Install PhantomJSDriver on Mac bypassing "Operation not permitted"



Introduction

I use PhantonJS Driver for screenscraping. The pages that are loaded contain JavaScript which loads the HTML. PhantomJS will render the pages in HTML.

I downloaded PhantomJS from their Download page. When I tried to put it into my /usr/bin directory so it would be on my path, I got the following error message.

Error Message:
cp ~/Tools/phantomjs-2.1.1-macosx/bin/phantomjs /usr/bin/
cp: /usr/bin/phantomjs: Operation not permitted


How I got around this Error

I have a ~/Tools directory that I use for these development tools. I used the ~/.bash_profile file to add it to the $PATH.


  1. Unzip PhantomJS into the ~/Tools directory.
  2. Add the location of phantomjs to the $PATH via the ~/.bash_profile.
  3. vim ~/.bash_profile
  4. export PATH=$PATH:~/Tools/phantomjs-2.1.1-macosx/bin
  5. source ~/.bash_profile
  6. Test if it is added to your CLI by executing `phantomjs`, to exit `Ctrl+C`.


reade more... Résuméabuiyad