My experience on my daily works... helping others ease each other

Monday, June 1, 2020

Reading entire URL content is really easy using R

In my good old days, reading the entire content of a website is not easy. The process of web scraping and getting the required data requires lots of programming and a few tools. A friend of mine even developed and sold the tool which he called it (during the development) as myrobot. He developed using PHP.

Now, it is much easier and one of the many ways is using R.

Here are the steps (which requires you to write ONLY two lines of code)

  1. Connect to the website using URL command
    con <- url ([the website url], “r”)
  2. Then, read the website
    x <- readLines(con)
  3. Do whatever you wish with the data. In this example, I print out the head of the website and also copy the whole content to a file.
    head(x)
    dput(x, “readFromUrlExample.html”)

There you go.

Result of the head(x) function
Snapshot of the content of the file copied into readFromUrlExamplehtml

The sample source code can be retrieved at 

https://github.com/masteramuk/LearnR-Coursera/blob/master/sample-ReadFromUrl.R

Share:

0 comments:

About Me

Somewhere, Selangor, Malaysia
An IT by profession, a beginner in photography

Blog Archive

Blogger templates