Webscraping with R using a Raspberry Pi

Setting up the Raspberry Pi

After the basic setup, i.e.

  • bought a Raspberry Pi Starter Kit
  • flashed the SD Card with Raspbian
  • ran raspi-config
  • installed R with apt-get install R, which installed R 3.1.1

I started to install the R packages usually needed for my cron-job tasks (mostly webscraping). I ran into problems with the rvest package because several packages could not be installed. Maybe there is a more efficient way but I did the following steps:

Install packages for webscraping

To install xml and related R packages (rvest), I needed the libxml2 on the system although apt-get had it, so I manually installed it:

1
2
3
wget ftp://xmlsoft.org/libxml2/libxml2-2.9.2.tar.gz
tar -xzvf libxml2-2.9.2.tar.gz
cd libxml2-2.9.2/

I also needed python-dev to make libxml2 compile.

1
2
sudo apt-get update
sudo apt-get install python-dev

Then built libxml2:

1
2
./configure --prefix=/usr --disable-static --with-history && make
sudo make install

I also had problems with the curl Package. Installation suggested to install libcurl4-openssl-dev therefore:

1
sudo apt-get install libcurl4-openssl-dev

Last problem was the openssl package. Again, I followed the suggestions from the failed R-package installation and installed libssl-dev:

1
sudo apt-get install libssl-dev

After that, rvest installed nicely. However, it took quite a while for the Pi to install all dependencies.

Webscraping Example – A simple frost warning for my plants

A simple Task, my Raspberry Pi is doing for me is sending a frost warning to my email if at 6 pm the weather forecast for the night goes below 3 °C. For this I got an API Key at openweathermap.org. Mind, that openweathermap.org does not like frequent requests (less than 1 per 10 minutes). At the beginning I got blocked.

You can then request some JSON for your city ID using your APPID (API Key):

1
2
library(jsonlite)
wd_json <- fromJSON("http://api.openweathermap.org/data/2.5/forecast/city?id=CITY_ID_GOES_HERE&APPID=YOUR_API_KEY_GOES_HERE")

Then tidy and extract the values needed. Temperatures are in degrees kelvin so we need to convert to celsius. The date I transform to POSIX.

1
2
3
4
5
wd <- wd_json$list
wd$Datum <- as.character(as.POSIXct(wd$dt, origin="1970-01-01", tz="Europe/Berlin"))
wd$Celsius_min <- wd$main$temp_min-273.15
wd$Celsius_max <- wd$main$temp_max-273.15
wd$Celsius_mean <- wd$main$temp-273.15

Sending results via email

Now for the part sending a mail:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
library(sendmailR)
library(xtable)
wd <- wd[as.POSIXct(Sys.time()+86400)>wd$Datum,]
if(any(wd$Celsius_min < 3)) {
dispatch <- print(xtable(wd[wd$Celsius_min<3,c("Datum","Celsius_min","Celsius_mean","Celsius_max")]),type="html")
msg <- mime_part(paste0('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<title>HTML demo</title>
<style type="text/css">
</style>
</head>
<body><h2>Frostwarnung</h2>',
dispatch,
'</body>
</html>'))
## Override content type.
msg[["headers"]][["Content-Type"]] <- "text/html"
from <- sprintf("<sendmailR@%s>", Sys.info()[4])
to <- "<YOUR@EMAIL_GOES_HERE.COM>"
subject <- paste("Frostwarnung",date())
body <- list(msg)
sendmail(from, to, subject, body,control=list(smtpServer="ASPMX.L.GOOGLE.COM"))

Finally we have to tell the Raspberry Pi to schedule the script to run daily at early evening. Save the .R file and add it to your crontab:

1
crontab -e

The first time you use crontab you are asked to choose an editor. Easiest (at least for me) to use is nano.
Add the following line:

1
00 18 * * * Rscript ~/path_to_your/script.R

Which will add the script to your cronjobs scheduling it at 18:00 every day and month.