Extract all users and relevant data, including retweeters, quoters, repliers, and mentions.

extract_users(
  tweet_df,
  summarize = TRUE,
  split = FALSE,
  as_tibble = tweetio_as_tibble(),
  ...
)

# S3 method for data.frame
extract_users(
  tweet_df,
  summarize = TRUE,
  split = FALSE,
  as_tibble = tweetio_as_tibble(),
  ...
)

# S3 method for data.table
extract_users(
  tweet_df,
  summarize = TRUE,
  split = FALSE,
  as_tibble = tweetio_as_tibble(),
  ...
)

Arguments

tweet_df

A data frame of tweets, as obtained by read_tweets() or one of {rtweet}'s collection functions, e.g. rtweet::search_tweets().

summarize

logical(1L), Default: TRUE. Whether to aggregate all users data to a single row containing the most recent non-missing values.

split

logical(1L), Default: FALSE. Whether to split users into separate data frames and return a list of those data frames, retaining all instances where each user was observed. Ignored if summarize is TRUE.

as_tibble

<logical>, Default: tweetio_as_tibble(). Whether a tibble::tibble() should be returned. Ignored if the {tibble} package is not installed.

...

Arguments passed to or from other methods.

Examples

path_to_tweet_file <- example_tweet_file() tweet_df <- read_tweets(path_to_tweet_file) extract_users(tweet_df, as_tibble = TRUE)
#> # A tibble: 1,356 x 20 #> user_id timestamp_ms name screen_name location description url #> <chr> <dttm> <chr> <chr> <chr> <chr> <chr> #> 1 194250… 2019-09-28 18:05:23 El S… Stgo_centro "CHILE" aportando … NA #> 2 340309… 2019-09-28 18:05:23 Juan… jg_valdes " Santi… ex Cancill… NA #> 3 825459… 2019-09-28 18:05:24 tayyy taylorxkas… "cali" ya girl ta… NA #> 4 218889… 2019-09-28 18:05:24 Mark… markaduck "On you… #Sooners #… NA #> 5 966825… 2019-09-28 18:05:24 unem… catholicnu… "♓︎" I’M NOT A … NA #> 6 319340… 2019-09-28 18:05:24 NA _CeeDeeThr… NA NA NA #> 7 278758… 2019-09-28 18:05:24 NA K1 NA NA NA #> 8 401300… 2019-09-28 18:05:25 mari… unmario "Bentiv… «siamo tut… NA #> 9 294908… 2019-09-28 18:05:25 Ana … ALmardoza21 "Juárez… Hablo sola… NA #> 10 107959… 2019-09-28 18:05:25 NA EsrodKatia NA NA NA #> # … with 1,346 more rows, and 13 more variables: protected <lgl>, #> # followers_count <int>, friends_count <int>, listed_count <int>, #> # statuses_count <int>, favourites_count <int>, account_created_at <dttm>, #> # verified <lgl>, profile_url <chr>, account_lang <chr>, #> # profile_banner_url <chr>, profile_image_url <chr>, bbox_coords <list>
extract_users(tweet_df, summarize = FALSE, as_tibble = TRUE)
#> # A tibble: 1,917 x 20 #> user_id timestamp_ms name screen_name location description url #> <chr> <dttm> <chr> <chr> <chr> <chr> <chr> #> 1 194250… 2019-09-28 18:05:23 El S… Stgo_centro "CHILE" aportando … NA #> 2 825459… 2019-09-28 18:05:24 tayyy taylorxkas… "cali" ya girl ta… NA #> 3 218889… 2019-09-28 18:05:24 Mark… markaduck "On you… #Sooners #… NA #> 4 401300… 2019-09-28 18:05:25 mari… unmario "Bentiv… «siamo tut… NA #> 5 294908… 2019-09-28 18:05:25 Ana … ALmardoza21 "Juárez… Hablo sola… NA #> 6 111622… 2019-09-28 18:05:27 भरमे… singhbhrme… "Bhopal… कर्म ही पू… NA #> 7 231344… 2019-09-28 18:05:27 bárb… barbimoral… "perdid… maldita le… http… #> 8 437465… 2019-09-28 18:05:27 Javi javii_sotoo "Califo… NA NA #> 9 117288… 2019-09-28 18:05:29 سما … nLfRH1 NA لا اله الا… NA #> 10 113247… 2019-09-28 18:05:30 Becc… bexxxv97 NA im here fo… NA #> # … with 1,907 more rows, and 13 more variables: protected <lgl>, #> # followers_count <int>, friends_count <int>, listed_count <int>, #> # statuses_count <int>, favourites_count <int>, account_created_at <dttm>, #> # verified <lgl>, profile_url <chr>, account_lang <chr>, #> # profile_banner_url <chr>, profile_image_url <chr>, bbox_coords <list>
split_users <- extract_users(tweet_df, summarize = FALSE, split = TRUE, as_tibble = TRUE) # first 3 users with more than 5 observations. split_users[vapply(split_users, function(.x) nrow(.x) > 5, logical(1L))][1:3]
#> $`4167284315` #> # A tibble: 10 x 21 #> user_id observation_type timestamp_ms name screen_name location #> <chr> <chr> <dttm> <chr> <chr> <chr> #> 1 416728… retweet 2019-09-28 18:05:27 ʀᴏʙs robs_ldnn London,… #> 2 416728… mentions 2019-09-28 18:05:27 NA robs_ldnn NA #> 3 416728… quoted 2019-09-28 18:05:54 ʀᴏʙs robs_ldnn London,… #> 4 416728… retweet 2019-09-28 18:06:24 ʀᴏʙs robs_ldnn London,… #> 5 416728… mentions 2019-09-28 18:06:24 NA robs_ldnn NA #> 6 416728… quoted 2019-09-28 18:07:58 ʀᴏʙs robs_ldnn London,… #> 7 416728… quoted 2019-09-28 18:09:34 ʀᴏʙs robs_ldnn London,… #> 8 416728… quoted 2019-09-28 18:10:24 ʀᴏʙs robs_ldnn London,… #> 9 416728… retweet 2019-09-28 18:13:04 ʀᴏʙs robs_ldnn London,… #> 10 416728… mentions 2019-09-28 18:13:04 NA robs_ldnn NA #> # … with 15 more variables: description <chr>, url <chr>, protected <lgl>, #> # followers_count <int>, friends_count <int>, listed_count <int>, #> # statuses_count <int>, favourites_count <int>, account_created_at <dttm>, #> # verified <lgl>, profile_url <chr>, account_lang <chr>, #> # profile_banner_url <chr>, profile_image_url <chr>, bbox_coords <list> #> #> $`1217220278` #> # A tibble: 14 x 21 #> user_id observation_type timestamp_ms name screen_name location #> <chr> <chr> <dttm> <chr> <chr> <chr> #> 1 121722… retweet 2019-09-28 18:05:30 Joré jukaziraw Las Veg… #> 2 121722… mentions 2019-09-28 18:05:30 NA jukaziraw NA #> 3 121722… retweet 2019-09-28 18:08:53 Joré jukaziraw Las Veg… #> 4 121722… mentions 2019-09-28 18:08:53 NA jukaziraw NA #> 5 121722… retweet 2019-09-28 18:09:07 Joré jukaziraw Las Veg… #> 6 121722… mentions 2019-09-28 18:09:07 NA jukaziraw NA #> 7 121722… retweet 2019-09-28 18:11:31 Joré jukaziraw Las Veg… #> 8 121722… mentions 2019-09-28 18:11:31 NA jukaziraw NA #> 9 121722… retweet 2019-09-28 18:13:34 Joré jukaziraw Las Veg… #> 10 121722… mentions 2019-09-28 18:13:34 NA jukaziraw NA #> 11 121722… retweet 2019-09-28 18:14:02 Joré jukaziraw Las Veg… #> 12 121722… mentions 2019-09-28 18:14:02 NA jukaziraw NA #> 13 121722… retweet 2019-09-28 18:14:50 Joré jukaziraw Las Veg… #> 14 121722… mentions 2019-09-28 18:14:50 NA jukaziraw NA #> # … with 15 more variables: description <chr>, url <chr>, protected <lgl>, #> # followers_count <int>, friends_count <int>, listed_count <int>, #> # statuses_count <int>, favourites_count <int>, account_created_at <dttm>, #> # verified <lgl>, profile_url <chr>, account_lang <chr>, #> # profile_banner_url <chr>, profile_image_url <chr>, bbox_coords <list> #> #> $`471677441` #> # A tibble: 20 x 21 #> user_id observation_type timestamp_ms name screen_name location #> <chr> <chr> <dttm> <chr> <chr> <chr> #> 1 471677… retweet 2019-09-28 18:05:41 Geor… gtconway3d Origina… #> 2 471677… mentions 2019-09-28 18:05:41 NA gtconway3d NA #> 3 471677… retweet 2019-09-28 18:06:38 Geor… gtconway3d Origina… #> 4 471677… mentions 2019-09-28 18:06:38 NA gtconway3d NA #> 5 471677… retweet 2019-09-28 18:06:52 Geor… gtconway3d Origina… #> 6 471677… mentions 2019-09-28 18:06:52 NA gtconway3d NA #> 7 471677… retweet 2019-09-28 18:08:07 Geor… gtconway3d Origina… #> 8 471677… mentions 2019-09-28 18:08:07 NA gtconway3d NA #> 9 471677… retweet 2019-09-28 18:08:18 Geor… gtconway3d Origina… #> 10 471677… mentions 2019-09-28 18:08:18 NA gtconway3d NA #> 11 471677… quoted 2019-09-28 18:11:41 Geor… gtconway3d Origina… #> 12 471677… quoted 2019-09-28 18:13:03 Geor… gtconway3d Origina… #> 13 471677… retweet 2019-09-28 18:13:06 Geor… gtconway3d Origina… #> 14 471677… mentions 2019-09-28 18:13:06 NA gtconway3d NA #> 15 471677… retweet 2019-09-28 18:14:20 Geor… gtconway3d Origina… #> 16 471677… mentions 2019-09-28 18:14:20 NA gtconway3d NA #> 17 471677… retweet 2019-09-28 18:14:32 Geor… gtconway3d Origina… #> 18 471677… mentions 2019-09-28 18:14:32 NA gtconway3d NA #> 19 471677… retweet 2019-09-28 18:15:04 Geor… gtconway3d Origina… #> 20 471677… mentions 2019-09-28 18:15:04 NA gtconway3d NA #> # … with 15 more variables: description <chr>, url <chr>, protected <lgl>, #> # followers_count <int>, friends_count <int>, listed_count <int>, #> # statuses_count <int>, favourites_count <int>, account_created_at <dttm>, #> # verified <lgl>, profile_url <chr>, account_lang <chr>, #> # profile_banner_url <chr>, profile_image_url <chr>, bbox_coords <list> #>