In my good old days, reading the entire content of a website is not easy. The process of web scraping and getting the required data requires lots of programming and a few tools. A friend of mine even developed and sold the tool which he called it (during the development) as myrobot. He developed using PHP.
Now, it is much easier and one of the many ways is using R.
Here are the steps (which requires you to write ONLY two lines of code)
- Connect to the website using URL command
con <- url ([the website url], “r”) - Then, read the website
x <- readLines(con) - Do whatever you wish with the data. In this example, I print out the head of the website and also copy the whole content to a file.
head(x)
dput(x, “readFromUrlExample.html”)
There you go.
The sample source code can be retrieved at
https://github.com/masteramuk/LearnR-Coursera/blob/master/sample-ReadFromUrl.R
0 comments:
Post a Comment