Skip to content Skip to sidebar Skip to footer

List Files On Http/ftp Server In R

I'm trying to get list of files on HTTP/FTP server from R!, so that in next step I will be able to download them (or select some of files which meet my criteria to download). I kno

Solution 1:

You really shouldn't use regex on html. The XML package makes this pretty simple. We can use getHTMLLinks() to gather any links we want.

library(XML)
getHTMLLinks(result)
#  [1] "Interesting file_20150629.txt""Interesting file_20150630.txt"#  [3] "Interesting file_20150701.txt""Interesting file_20150702.txt"#  [5] "Interesting file_20150703.txt""Interesting file_20150704.txt"#  [7] "Interesting file_20150705.txt""Interesting file_20150706.txt"#  [9] "Interesting file_20150707.txt""Interesting file_20150708.txt"# [11] "Interesting file_20150709.txt"

That will get all /@href links contained in //a. To grab only the ones that contain.txt, you can use a different XPath query from the default.

getHTMLLinks(result, xpQuery = "//a/@href[contains(., '.txt')]")

Or even more precisely, to get those files that end with .txt, you can do

getHTMLLinks(
    result,
    xpQuery = "//a/@href['.txt'=substring(., string-length(.) - 3)]"
) 

Solution 2:

An alternative without loading additional libraries is to turn ftp.use.epsv=FALSE and crlf = TRUE. This will instruct libcurl to change \n's to \r\n's:

require("RCurl") 
result <- getURL("http://server",verbose=TRUE,ftp.use.epsv=FALSE, dirlistonly = TRUE, crlf = TRUE)

Then extract the individual URLs to the files using paste and strsplit,

result2 <- paste("http://server", strsplit(result, "\r*\n")[[1]], sep = "")

Post a Comment for "List Files On Http/ftp Server In R"