Skip to main content

Replace values based on ID's for determinate columns

I have this situation:

ID          n  post    date    el    a   b   c   d
100_left    4   50  10/11/2020  y   190 5.41 4  300
100_right   4   50  10/11/2020  n       5.4  5  200
101_left    4   50  10/11/2020  y   180 5.49 6  360
101_right   4   50  10/11/2020  n       5.48 6  180
102_left    4   50  10/11/2020  y   190 5.5  3  300
102_right   4   50  10/11/2020  n       5.46 5  200
103_left    4   50  10/11/2020  y   190 5.39 3  170
103_right   4   50  10/11/2020  n       5.44 3  360

I would like to use the same values of n_left for n_right (ID), but just for columns from a to d. Like this:

ID          n  post   date     el   a    b   c   d
100_left    4   50  10/11/2020  y   190 5.41 4  300
100_right   4   50  10/11/2020  n   190 5.41 4  300
101_left    4   50  10/11/2020  y   180 5.49 6  360
101_right   4   50  10/11/2020  n   180 5.49 6  360
102_left    4   50  10/11/2020  y   190 5.5  3  300
102_right   4   50  10/11/2020  n   190 5.5  3  300
103_left    4   50  10/11/2020  y   190 5.39 3  170
103_right   4   50  10/11/2020  n   190 5.39 3  170

Sorry for not putting codes but I don't know where to start. OBS: This is just a sample, my original dataset is much bigger.

Answer

We may group by the prefix part of 'ID' after removing the characters from _ with str_remove, then mutate across the columns 'a' to 'd', by selecting the values where 'el' is 'y'

library(dplyr)
library(stringr)
df1 %>% 
  dplyr::group_by(grp = stringr::str_remove(ID, "_.*")) %>% 
  dplyr::mutate(across(a:d, ~ .[el == 'y'])) %>%
  ungroup %>% 
  dplyr::select(-grp)

-output

# A tibble: 8 × 9
  ID            n  post date       el        a     b     c     d
  <chr>     <int> <int> <chr>      <chr> <int> <dbl> <int> <int>
1 100_left      4    50 10/11/2020 y       190  5.41     4   300
2 100_right     4    50 10/11/2020 n       190  5.41     4   300
3 101_left      4    50 10/11/2020 y       180  5.49     6   360
4 101_right     4    50 10/11/2020 n       180  5.49     6   360
5 102_left      4    50 10/11/2020 y       190  5.5      3   300
6 102_right     4    50 10/11/2020 n       190  5.5      3   300
7 103_left      4    50 10/11/2020 y       190  5.39     3   170
8 103_right     4    50 10/11/2020 n       190  5.39     3   170

data

df1 <- structure(list(ID = c("100_left", "100_right", "101_left", "101_right", 
"102_left", "102_right", "103_left", "103_right"), n = c(4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L), post = c(50L, 50L, 50L, 50L, 50L, 
50L, 50L, 50L), date = c("10/11/2020", "10/11/2020", "10/11/2020", 
"10/11/2020", "10/11/2020", "10/11/2020", "10/11/2020", "10/11/2020"
), el = c("y", "n", "y", "n", "y", "n", "y", "n"), a = c(190L, 
NA, 180L, NA, 190L, NA, 190L, NA), b = c(5.41, 5.4, 5.49, 5.48, 
5.5, 5.46, 5.39, 5.44), c = c(4L, 5L, 6L, 6L, 3L, 5L, 3L, 3L), 
    d = c(300L, 200L, 360L, 180L, 300L, 200L, 170L, 360L)), 
class = "data.frame", row.names = c(NA, 
-8L))

Comments