lubridate::mdy()の挙動に関するメモ
はじめに
lubridate::mdy()
の挙動で知らないことがあったのでメモ。
lubridate::mdy()
lubridate::mdy()は簡単にいうと、年日月という形式のデータを日付に変換してくれる関数。なんだけど、渡す文字列によっては想定どおりに変換されないので注意。"December 4 2020"
は2020年12月4日に変換されてほしいが日の要素の頭に空白
または0
がないとうまく変換されない。
packageVersion("lubridate") [1] ‘1.7.9’ mdy("December 4 2020") [1] "2020-04-20" mdy("December 4 2020") [1] "2020-12-04" mdy("December 04 2020") [1] "2020-12-04"
例えば、December 4, 2020 at 04:44PM
という形式のデータがあると、年月日の部分だけ取り出して、lubridate::mdy()
を使うと痛い目にあう。
df <- tibble(date_hm = c("December 4, 2020 at 04:44PM", "December 4, 2020 at 04:44PM", "December 04, 2020 at 04:44PM", "April 4, 2020 at 04:44PM", "April 4, 2020 at 04:44PM", "April 04, 2020 at 04:44PM"), date = c("December 4, 2020", "December 4, 2020", "December 04, 2020", "April 4, 2020", "April 4, 2020", "April 04, 2020") ) df %>% dplyr::mutate(test_mdy_hm = lubridate::mdy_hm(date_hm), test_mdy = lubridate::mdy(date)) %>% dplyr::select(date_hm, test_mdy_hm, date, test_mdy) # A tibble: 6 x 4 date_hm test_mdy_hm date test_mdy <chr> <dttm> <chr> <date> 1 December 4, 2020 at 04:44PM 2020-12-04 16:44:00 December 4, 2020 2020-04-20 # No 2 December 4, 2020 at 04:44PM 2020-12-04 16:44:00 December 4, 2020 2020-12-04 3 December 04, 2020 at 04:44PM 2020-12-04 16:44:00 December 04, 2020 2020-12-04 4 April 4, 2020 at 04:44PM 2020-04-04 16:44:00 April 4, 2020 2020-04-20 # No 5 April 4, 2020 at 04:44PM 2020-04-04 16:44:00 April 4, 2020 2020-04-04 6 April 04, 2020 at 04:44PM 2020-04-04 16:44:00 April 04, 2020 2020-04-04
ということで、暫定的に下記のように面倒だけどコネコネして変換する。ほかにもっといい方法はあると思う。
library(googlesheets4) library(tidyverse) library(lubridate) sheets_auth() d <- read_sheet( ss = "1ZKHN9PeuApHGzetxptpYz0C3K0uvfT5OVX3gWKw-poY", range = "シート1!A1:C154" ) %>% dplyr::select(-unit) d %>% dplyr::mutate( date0 = stringr::str_replace(date, pattern = '[:space:]at.*', ''), date1 = stringr::str_replace(date0, pattern = ',', '') ) %>% tidyr::separate(date1, c("mm", "dd", "yyyy")) %>% dplyr::mutate(date_mod = ymd(paste0(yyyy,"/",mm,"/",dd))) %>% dplyr::select(date, date_mod, weight) # A tibble: 153 x 3 date date_mod weight <chr> <date> <dbl> 1 July 5, 2020 at 10:44PM 2020-07-05 88.4 2 July 6, 2020 at 18:53PM 2020-07-06 86.8 3 July 7, 2020 at 18:40PM 2020-07-07 86.2 4 July 8, 2020 at 18:37PM 2020-07-08 86.0 5 July 9, 2020 at 20:43PM 2020-07-09 85.2 6 July 10, 2020 at 20:21PM 2020-07-10 84.9 7 July 11, 2020 at 20:20PM 2020-07-11 84.7 8 July 12, 2020 at 19:1PM 2020-07-12 84 9 July 13, 2020 at 20:23PM 2020-07-13 83.4 10 July 14, 2020 at 18:57PM 2020-07-14 83.1 # … with 143 more rows