This is a tutorial on how to download data from Etsy.
Etsy makes a lot of data avalailable to systematically download using its Application Programming Interface (API). If you sign up to be an Etsy developer, you can obtain a key to make queries directly to Etsy. With this key, you can obtain, among other data, sales ranks of various products.
Details about the associate program here:
https://www.etsy.com/developers/
This tutorial will demonstrate how to download such data with R. In particular, we will search for 1) listings, 2) store-level attributes, 3) user-level attributes, and 4) user level connections.
First, load the library required to parse the API output.
Etsy returns query results in Json format.
More on Json here: http://en.wikipedia.org/wiki/JSON
| |
The jsonlite() package can translate JSON output into R dataframes. | library(jsonlite)
|
Get listings.
There is a wealth of data available, but we'll start by getting the listings.
You will need an API key, which is freely available on request.
| |
Queries are formed via constructing URLs with specific parameters. | # The URL queries the API to get listings
# The fromJSON function transforms the JSON to an R dataframe
listing <- fromJSON(txt='https://openapi.etsy.com/v2/listings/active?api_key=REDACTED')
|
The JSON is parsed into a list of elements, one of which is a dataframe of item listings. | # Parse into dataframe
etsy_json <- as.data.frame(listing$results)
# The data are quite extensive, so we'll just peek at select variables
head(etsy_json)[,c("listing_id", "price", "quantity", "category_path")]
## listing_id price quantity category_path
## 1 228814339 73.00 2 Housewares, Pillow
## 2 213822853 5900.00 1 Accessories, Wallet
## 3 227779303 45.00 1 Clothing, Women, Dress
## 4 228817180 25.00 1 Art, Painting
## 5 77609441 766.00 100 Jewelry, Ring
## 6 118479349 12.00 1 Accessories, Hair, Scrunchie
|
List store level attributes.
We can examine a number of store level attributes.
| |
We'll choose the store “EPUU” | # The URL queries the API to get listings
# The fromJSON function transforms the JSON to an R dataframe
store <- fromJSON(txt='https://openapi.etsy.com/v2/shops/epuu/listings/active?api_key=REDACTED')
# Parse into dataframe
store_json <- as.data.frame(store$results)
# The data are quite extensive, so we'll just peek at select variables
head(store_json)[,c("state", "category_id", "category_path", "price",
"views", "num_favorers", "materials")]
## state category_id category_path price views num_favorers
## 1 active 69151567 Jewelry, Necklace 40.00 38 13
## 2 active 69151501 Jewelry, Earrings 38.00 893 214
## 3 active 69154963 Weddings, Accessories 46.00 37 7
## 4 active 69154963 Weddings, Accessories 44.00 27 3
## 5 active 69154963 Weddings, Accessories 38.00 9 1
## 6 active 69154963 Weddings, Accessories 36.00 22 4
## materials
## 1
## 2 gold plated earring hooks
## 3
## 4
## 5
## 6
|
List individual level attributes.
We can examine a number of individual level attributes.
| |
We'll redact the identity of this individual, but we can get their user ID, login name, and the number of feedback left. | # The URL queries the API to get listings
# The fromJSON function transforms the JSON to an R dataframe
user <- fromJSON(txt='https://openapi.etsy.com/v2/users/sidneydodge?api_key=REDACTED')
# Parse into dataframe
user_json <- as.data.frame(user$results)
# This is a list of the available variables.
# Some of them are marked "NA" because they are not publicly available.
user_json
## user_id login_name creation_tsz user_pub_key referred_by_user_id
## 1 REDACTED REDACTED 1322632980 NA NA
## feedback_info.count feedback_info.score
## 1 7 100
|
List an individual's connection to others.
We can examine the individual's connection with others (i.e., egonets).
| |
We'll redact the identity of this individual, but we can get their user ID, login name, and the number of feedback left. | # The URL queries the API to get listings
# The fromJSON function transforms the JSON to an R dataframe
connection <- fromJSON(txt='https://openapi.etsy.com/v2/users/REDACTED
/connected_users?api_key=REDACTED')
# Parse into dataframe
connection_json <- as.data.frame(connection$results)
# The data are quite extensive, so we'll just peek at select variables
head(connection_json[,c("user_id", "login_name", "creation_tsz")])
## user_id login_name creation_tsz
## 1 uid1 1 1354288084
## 2 uid2 2 1322441388
## 3 uid3 3 1360415993
## 4 uid4 4 1282958774
## 5 uid5 5 1350755065
## 6 uid6 6 1282339820
|
| |