Description
A wrapper for ‘ada-url’, a ‘WHATWG’ compliant and fast URL parser written in modern ‘C++’. Also contains auxiliary functions such as a public suffix extractor.
Keywords
- URL Parsing
- Webtracking Data
- Webscraping
Use Cases
URL parsing is an important process in the analysis of webtracking data, e.g. GESIS Web Tracking. Although not using this package, the technique has been used in various social science publications, e.g. de León et al. (2023).
The package was used in various webscraping projects for communication research, e.g. paperboy.
Input Data
The input data has to be a vector of URLs and looks like this:
urls <- c("https://www.google.de/search?q=GESIS&client=ubuntu&hs=ixb&sca_esv=dccc38f8e2930152&sca_upv=1")
urls
[1] "https://www.google.de/search?q=GESIS&client=ubuntu&hs=ixb&sca_esv=dccc38f8e2930152&sca_upv=1"
Output Data
The output data is a data frame of parsed URLs.
Hardware Requirements
adaR runs on any hardware that can run R.
Environment Setup
With R installed:
install.packages("adaR")
How to Use
Please refer to the “Introduction to adaR” for a comprehensive introduction of the package.
The main function of this package is ada_url_parse() and it decomposes a url into its components.
library(adaR)
urls <- c("https://www.google.de/search?q=GESIS&client=ubuntu&hs=ixb&sca_esv=dccc38f8e2930152&sca_upv=1",
"https://www.nytimes.com/2024/06/19/world/africa/sudan-darfur-takeaways.html",
"https://www.sueddeutsche.de/thema/Fu%C3%9Fball-EM")
ada_url_parse(urls)
href
1 https://www.google.de/search?q=GESIS&client=ubuntu&hs=ixb&sca_esv=dccc38f8e2930152&sca_upv=1
2 https://www.nytimes.com/2024/06/19/world/africa/sudan-darfur-takeaways.html
3 https://www.sueddeutsche.de/thema/Fußball-EM
protocol username password host hostname port
1 https: www.google.de www.google.de
2 https: www.nytimes.com www.nytimes.com
3 https: www.sueddeutsche.de www.sueddeutsche.de
pathname
1 /search
2 /2024/06/19/world/africa/sudan-darfur-takeaways.html
3 /thema/Fußball-EM
search hash
1 ?q=GESIS&client=ubuntu&hs=ixb&sca_esv=dccc38f8e2930152&sca_upv=1
2
3
Technical Details
See the official CRAN page for further information about technical details.
Contact Details
Maintainer: David Schoch david@schochastics.net
Issue Tracker: https://github.com/gesistsa/adaR/issues