rvest
-
robotstxt::paths_allowed(url)checks if a page is reachable -
read_html(url)gets the content of the HTML page, returns a XML object -
html_table()get the tables out of a fetched HTML page, returns a R Data Structure#List of tibbles (dataframes)- Process a table:
mytable %>% filter(X1 == "Version:") %>% pull(X2)
- Process a table:
-
html_elements()get specific elements, returns a listhtml_elements(x, "h2")html_elements(x, "#current_visitors")html_elements(x, ".data")
-
html_element()works likehtml_elements(), except that it returns a single node -
html_text()get the text out of nodes -
rvest doesn't work with dynamically loaded content
- Download the loaded page as a local file
- Use another package
RSelenium
by zcysxy