Previously, we covered side effect piping, that is, if an enclosed expression that follows %>>%
starts with ~
, it tells the operator to evaluate the expression only for its side effect.
Functions having side effects can be categorized into several types. We have shown side effects such as printing (print()
objects), message (cat()
, message()
), and graphics (plot()
).
In addition to printing and plotting, one may need to save an intermediate value to the environment by assigning it to a variable (or symbol). Perhaps assignment is the most important side effect among all. Just imagine a version of R in which we cannot assign.
Therefore, assignment as a side effect deserves a set of syntax to be made easier. If one needs to assign the value to a symbol, just insert a step like (~ symbol)
, then the input value of that step will be assigned to symbol
in the current environment.
This syntax is probably the simplest case for an side effect. Since evaluating a symbol for side effect rarely makes sense, it is instead interpreted as assigning input value to the given symbol.
mtcars %>>%
subset(select = c(mpg, wt, cyl)) %>>%
(~ sub_mtcars) %>>% # assign subsetted mtcars to sub_mtcars
lm(formula = mpg ~ wt + cyl) %>>%
(~ lm_mtcars) %>>% # assign linear model to lm_mtcars
summary
#
# Call:
# lm(formula = mpg ~ wt + cyl, data = .)
#
# Residuals:
# Min 1Q Median 3Q Max
# -4.2893 -1.5512 -0.4684 1.5743 6.1004
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 39.6863 1.7150 23.141 < 2e-16 ***
# wt -3.1910 0.7569 -4.216 0.000222 ***
# cyl -1.5078 0.4147 -3.636 0.001064 **
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#
# Residual standard error: 2.568 on 29 degrees of freedom
# Multiple R-squared: 0.8302, Adjusted R-squared: 0.8185
# F-statistic: 70.91 on 2 and 29 DF, p-value: 6.809e-12
Then we can inspect the environment and see what is in it.
ls.str()
# lm_mtcars : List of 12
# $ coefficients : Named num [1:3] 39.69 -3.19 -1.51
# $ residuals : Named num [1:32] -1.279 -0.465 -3.452 1.019 2.053 ...
# $ effects : Named num [1:32] -113.65 -29.12 -9.34 1.33 1.6 ...
# $ rank : int 3
# $ fitted.values: Named num [1:32] 22.3 21.5 26.3 20.4 16.6 ...
# $ assign : int [1:3] 0 1 2
# $ qr :List of 5
# $ df.residual : int 29
# $ xlevels : Named list()
# $ call : language lm(formula = mpg ~ wt + cyl, data = .)
# $ terms :Classes 'terms', 'formula' length 3 mpg ~ wt + cyl
# $ model :'data.frame': 32 obs. of 3 variables:
# sub_mtcars : 'data.frame': 32 obs. of 3 variables:
# $ mpg: num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
# $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
# $ cyl: num 6 6 4 6 8 6 8 4 4 6 ...
These two variables are exactly the intermediate results we wanted to save to the environment.
However, sometimes we don't want to directly save the input value but the value after some transformation. Then we can use =
to specify a lambda expression to tell what to be saved. In pipeR v0.5, <-
and more natural ->
are allowed in assignment too, which may make the code more readable.
mtcars %>>%
subset(select = c(mpg, wt, cyl)) %>>%
(~ summ = summary(.)) %>>% # side-effect assignment
lm(formula = mpg ~ wt + cyl)
#
# Call:
# lm(formula = mpg ~ wt + cyl, data = .)
#
# Coefficients:
# (Intercept) wt cyl
# 39.686 -3.191 -1.508
Then we can notice that summ
is saved to the environment.
summ
# mpg wt cyl
# Min. :10.40 Min. :1.513 Min. :4.000
# 1st Qu.:15.43 1st Qu.:2.581 1st Qu.:4.000
# Median :19.20 Median :3.325 Median :6.000
# Mean :20.09 Mean :3.217 Mean :6.188
# 3rd Qu.:22.80 3rd Qu.:3.610 3rd Qu.:8.000
# Max. :33.90 Max. :5.424 Max. :8.000
Like side effect expression can be a lambda expression, so can the expression being assigned following =
.
mtcars %>>%
subset(select = c(mpg, wt, cyl)) %>>%
(~ summ = df ~ summary(df)) %>>% # side-effect assignment
lm(formula = mpg ~ wt + cyl)
Note that the all above assignment operations works purely as side effect, they do not influence the value being piped. In other words, if these lines are removed, the input value will continue piping but without being assigned to given symbol.
What if one really wants the result not only to be assigned to a symbol but also to continue the flow to the next expression?
Two methods meet the demand:
(~ symbol)
after the expression for assignment.(symbol = expression)
to assign the value of expression
to symbol
.Note that the second method is fresh here but it should look natural because it can be easily distinguished from (~ symbol = expression)
which is only for side effect.
mtcars %>>%
subset(select = c(mpg, wt, cyl)) %>>%
(~ summ = df ~ summary(df)) %>>% # side-effect assignment
(model = lm(mpg ~ wt + cyl, data = .)) # pipe and assign
To verify the assignment, evaluate model
.
model
#
# Call:
# lm(formula = mpg ~ wt + cyl, data = .)
#
# Coefficients:
# (Intercept) wt cyl
# 39.686 -3.191 -1.508
In pipeR v0.5, the assignment operators are enabled for their job. Note that the merit of a pipeline is its readability, a contributing factor is that the functions in each step are immediately visible so that one can easily figure out what the code does. The =
syntax for assignment, to some extent, weakens the readability of the code because the functions are put behind, which, by contrast, does not happen with ->
used for assignment.
mtcars %>>%
(~ summary(.) -> summ)
mtcars %>>%
(~ summ <- summary(.))
The (~ expression -> symbol)
and (~ symbol <- expression)
syntax work for side-effect assignment, and (expression -> symbol)
and (symbol <- assignment)
work for piping with assignment.
mtcars %>>%
(~ summary(.) -> summ) %>>% # side-effect assignment
(lm(formula = mpg ~ wt + cyl, data = .) -> lm_mtcars) %>>% # continue piping
summary
In addition to all above examples, the assignment feature is more powerful than has been demonstrated. The assignment operators =
, <-
and ->
even support subset and element assignment. For example,
results <- list()
mtcars %>>%
lm(formula = mpg ~ wt + cyl) %>>%
(~ results$mtcars = . %>>% summary %>>% (r.squared))
#
# Call:
# lm(formula = mpg ~ wt + cyl, data = .)
#
# Coefficients:
# (Intercept) wt cyl
# 39.686 -3.191 -1.508
iris %>>%
lm(formula = Sepal.Length ~ Sepal.Width) %>>%
(~ results$iris = . %>>% summary %>>% (r.squared))
#
# Call:
# lm(formula = Sepal.Length ~ Sepal.Width, data = .)
#
# Coefficients:
# (Intercept) Sepal.Width
# 6.5262 -0.2234
Then we can print the results and see the values in it.
results
# $mtcars
# [1] 0.8302274
#
# $iris
# [1] 0.01382265
The similar code works with ->
or <-
too, which can be more natural and less disturbing in pipeline.
set.seed(0)
results <- numeric()
rnorm(100) %>>%
(~ mean(.) -> results["mean"]) %>>%
(~ median(.) -> results["median"]) %>>%
summary
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -2.22400 -0.56940 -0.03296 0.02267 0.62540 2.44100
Print results
and show the values in it.
results
# mean median
# 0.02266845 -0.03296148
More than simply assigning values to symbols, the expression can also be setting names and others.
numbers <- 1:5
letters %>>%
sample(length(numbers)) %>>%
(~ . -> names(numbers))
# [1] "u" "g" "f" "l" "x"
Now the names of numbers
become the randomly sampled letters.
numbers
# u g f l x
# 1 2 3 4 5