build_messages()
reconciles and merges the messages from
im_orig_dfs$orig_message_posts
and im_core_dfs$core_message_posts
.
build_messages( summarize = TRUE, split = FALSE, as_tibble = ironmarch_as_tibble() )
summarize |
|
---|---|
split |
|
as_tibble |
|
Data Frame
default: a tibble::tibble()
if as_tibble
is TRUE
and {tibble}
is installed.
alternative: a data.table::data.table()
If split
is TRUE
and summarize
is FALSE
, the result will be a list()
of
data frames.
# messages summarized to single rows ================================================== build_messages()#> # A tibble: 22,309 x 8 #> msg_id msg_topic_id msg_date msg_post msg_post_key msg_author_id #> <int> <int> <dttm> <chr> <chr> <int> #> 1 1 1 2011-09-16 03:49:58 "<p>The… 3320f7f06c4… 1 #> 2 2 2 2011-09-16 11:54:08 "\n<p>W… 9204e488332… 11 #> 3 3 2 2011-09-16 14:39:59 "<p>Cri… 12fd0309239… 1 #> 4 4 2 2011-09-16 15:29:01 "<p>Tha… 0658c6f99ac… 11 #> 5 5 2 2011-09-16 15:32:58 "<p>If … 570257864e3… 1 #> 6 6 2 2011-09-16 15:44:51 "<p>The… aabeacc8f4c… 11 #> 7 7 3 2011-09-17 01:43:49 "<p>I d… a36f67c0d72… 16 #> 8 8 3 2011-09-17 01:59:50 "\n<blo… 327b933d818… 14 #> 9 12 5 2011-09-20 14:20:14 "<p>The… 0667258c387… 1 #> 10 13 5 2011-09-20 14:42:17 "<p>Don… de1d7fd2737… 20 #> # … with 22,299 more rows, and 2 more variables: msg_ip_address <chr>, #> # msg_is_first_post <lgl># all message observations ============================================================ build_messages(summarize = FALSE)#> # A tibble: 35,056 x 9 #> which_df msg_id msg_topic_id msg_date msg_post msg_post_key #> <chr> <int> <int> <dttm> <chr> <chr> #> 1 core 1 1 2011-09-16 03:49:58 "<p>The… 3320f7f06c4… #> 2 core 2 2 2011-09-16 11:54:08 "\n<p>W… 9204e488332… #> 3 core 3 2 2011-09-16 14:39:59 "<p>Cri… 12fd0309239… #> 4 core 4 2 2011-09-16 15:29:01 "<p>Tha… 0658c6f99ac… #> 5 core 5 2 2011-09-16 15:32:58 "<p>If … 570257864e3… #> 6 core 6 2 2011-09-16 15:44:51 "<p>The… aabeacc8f4c… #> 7 core 7 3 2011-09-17 01:43:49 "<p>I d… a36f67c0d72… #> 8 core 8 3 2011-09-17 01:59:50 "\n<blo… 327b933d818… #> 9 core 12 5 2011-09-20 14:20:14 "<p>The… 0667258c387… #> 10 core 13 5 2011-09-20 14:42:17 "<p>Don… de1d7fd2737… #> # … with 35,046 more rows, and 3 more variables: msg_author_id <int>, #> # msg_ip_address <chr>, msg_is_first_post <lgl># all message observations split into a list of data frames =========================== split_messages <- build_messages(summarize = FALSE, split = TRUE) # first five messages that appear in both... ========================================== # ... `im_orig_dfs$orig_message_posts` and `im_core_dfs$core_message_posts` split_messages[ vapply(split_messages, function(.x) nrow(.x) == 2, logical(1L)) ][1:5]#> $`1` #> # A tibble: 2 x 9 #> which_df msg_id msg_topic_id msg_date msg_post msg_post_key #> <chr> <int> <int> <dttm> <chr> <chr> #> 1 core 1 1 2011-09-16 03:49:58 <p>The … 3320f7f06c4… #> 2 orig 1 1 2011-09-16 03:49:58 The bes… 3320f7f06c4… #> # … with 3 more variables: msg_author_id <int>, msg_ip_address <chr>, #> # msg_is_first_post <lgl> #> #> $`2` #> # A tibble: 2 x 9 #> which_df msg_id msg_topic_id msg_date msg_post msg_post_key #> <chr> <int> <int> <dttm> <chr> <chr> #> 1 core 2 2 2011-09-16 11:54:08 "\n<p>W… 9204e488332… #> 2 orig 2 2 2011-09-16 11:54:08 "Who ar… 9204e488332… #> # … with 3 more variables: msg_author_id <int>, msg_ip_address <chr>, #> # msg_is_first_post <lgl> #> #> $`3` #> # A tibble: 2 x 9 #> which_df msg_id msg_topic_id msg_date msg_post msg_post_key #> <chr> <int> <int> <dttm> <chr> <chr> #> 1 core 3 2 2011-09-16 14:39:59 <p>Cris… 12fd0309239… #> 2 orig 3 2 2011-09-16 14:39:59 Crisis … 12fd0309239… #> # … with 3 more variables: msg_author_id <int>, msg_ip_address <chr>, #> # msg_is_first_post <lgl> #> #> $`4` #> # A tibble: 2 x 9 #> which_df msg_id msg_topic_id msg_date msg_post msg_post_key #> <chr> <int> <int> <dttm> <chr> <chr> #> 1 core 4 2 2011-09-16 15:29:01 <p>Than… 0658c6f99ac… #> 2 orig 4 2 2011-09-16 15:29:01 Thank y… 0658c6f99ac… #> # … with 3 more variables: msg_author_id <int>, msg_ip_address <chr>, #> # msg_is_first_post <lgl> #> #> $`5` #> # A tibble: 2 x 9 #> which_df msg_id msg_topic_id msg_date msg_post msg_post_key #> <chr> <int> <int> <dttm> <chr> <chr> #> 1 core 5 2 2011-09-16 15:32:58 <p>If y… 570257864e3… #> 2 orig 5 2 2011-09-16 15:32:58 If you … 570257864e3… #> # … with 3 more variables: msg_author_id <int>, msg_ip_address <chr>, #> # msg_is_first_post <lgl> #>