List filtering is to select list elements by given criteria. In rlist package, more than ten functions are related with list filtering. Basically, they all perform mapping first but then aggregate the results in different ways.
First, we load the sample data.
library(rlist)
people <- list.load("https://renkun-ken.github.io/rlist-tutorial/data/sample.json")
list.filter()
filters a list by an expression that returns TRUE
or FALSE
. The results only contain the list elements for which the value of that expression turns out to be TRUE
.
Different from list mapping which evaluates an expression given each list element, list filtering evaluates an expression to decide whether to include the entire element in the results.
str(list.filter(people, Age >= 25))
# List of 1
# $ :List of 4
# ..$ Name : chr "James"
# ..$ Age : int 25
# ..$ Interests: chr [1:2] "sports" "music"
# ..$ Expertise:List of 3
# .. ..$ R : int 3
# .. ..$ Java: int 2
# .. ..$ Cpp : int 5
Note that list.filter()
filters the data with given conditions and the list elements that satisfy the conditions will be returned. We call str()
on the results to shorten the output.
Using pipeline, we can first filter the data and then map the resulted elements by expression. For example, we can get the names of those whose age is no less than 25.
library(pipeR)
people %>>%
list.filter(Age >= 25) %>>%
list.mapv(Name)
# [1] "James"
If one has to write the code in traditional approach, it can be
list.mapv(list.filter(people, Age >= 25), Name)
or
people_filtered <- list.filter(people, Age >= 25)
list.mapv(people_filtered, Name)
It is obvious that both versions are quite redundant. Therefore, we will heavily use pipeline in the demonstration the features from now on to make the data processing look more elegant, and reduce the amount of information in the output.
Similarly, we can also get the names of those who are interested in music.
people %>>%
list.filter("music" %in% Interests) %>>%
list.mapv(Name)
# [1] "Ken" "James"
We can get the names of those who have been using programming languages for at least three years on average.
people %>>%
list.filter(mean(as.numeric(Expertise)) >= 3) %>>%
list.mapv(Name)
# [1] "Ken" "James"
Meta-symbols like .
, .i
, and .name
can also be used. The following code will pick up the list element whose index is even.
people %>>%
list.filter(.i %% 2 == 0) %>>%
list.mapv(Name)
# [1] "James"
In some cases, we don't need to find all the instances given the criteria. Rather, we only need to find a few, sometimes only one. list.find()
avoids searching across all list element but stops at a specific number of items found.
people %>>%
list.find(Age >= 25, 1) %>>%
list.mapv(Name)
# [1] "James"
Similar with list.find()
, list.findi()
only returns the index of the elements found.
list.findi(people, Age >= 23, 2)
# [1] 1 2
You may verify that if the number of instances to find is greater than the actual number of instances in the data, all qualified instances will be returned.
list.first()
and list.last()
are used to find the first and last element that meets certain condition if specified, respectively.
str(list.first(people, Age >= 23))
# List of 4
# $ Name : chr "Ken"
# $ Age : int 24
# $ Interests: chr [1:3] "reading" "music" "movies"
# $ Expertise:List of 3
# ..$ R : int 2
# ..$ CSharp: int 4
# ..$ Python: int 3
str(list.last(people, Age >= 23))
# List of 4
# $ Name : chr "Penny"
# $ Age : int 24
# $ Interests: chr [1:2] "movies" "reading"
# $ Expertise:List of 3
# ..$ R : int 1
# ..$ Cpp : int 4
# ..$ Python: int 2
These two functions also works when the condition is missing. In this case, they simply take out the first/last element from the list or vector.
list.first(1:10)
# [1] 1
list.last(1:10)
# [1] 10
list.take()
takes at most a given number of elements from a list. If the number is even larger than the length of the list, the function will by default return all elements in the list.
list.take(1:10, 3)
# [1] 1 2 3
list.take(1:5, 8)
# [1] 1 2 3 4 5
As opposed to list.take()
, list.skip()
skips at most a given number of elements in the list and take all the rest as the results. If the number of elements to skip is equal or greater than the length of that list, an empty one will be returned.
list.skip(1:10, 3)
# [1] 4 5 6 7 8 9 10
list.skip(1:5, 8)
# integer(0)
Similar to list.take()
, list.takeWhile()
is also designed to take out some elements from a list but subject to a condition. Basically, it keeps taking elements while a condition holds true.
people %>>%
list.takeWhile(Expertise$R >= 2) %>>%
list.map(list(Name = Name, R = Expertise$R)) %>>%
str
# List of 2
# $ :List of 2
# ..$ Name: chr "Ken"
# ..$ R : int 2
# $ :List of 2
# ..$ Name: chr "James"
# ..$ R : int 3
list.skipWhile()
keeps skipping elements while a condition holds true.
people %>>%
list.skipWhile(Expertise$R <= 2) %>>%
list.map(list(Name = Name, R = Expertise$R)) %>>%
str
# List of 2
# $ :List of 2
# ..$ Name: chr "James"
# ..$ R : int 3
# $ :List of 2
# ..$ Name: chr "Penny"
# ..$ R : int 1
list.is()
returns a logical vector that indicates whether a condition holds for each member of a list.
list.is(people, "music" %in% Interests)
# [1] TRUE TRUE FALSE
list.is(people, "Java" %in% names(Expertise))
# [1] FALSE TRUE FALSE
list.which()
returns a integer vector of the indices of the elements of a list that meet a given condition.
list.which(people, "music" %in% Interests)
# [1] 1 2
list.which(people, "Java" %in% names(Expertise))
# [1] 2
list.all()
returns TRUE
if all the elements of a list satisfy a given condition, or FALSE
otherwise.
list.all(people, mean(as.numeric(Expertise)) >= 3)
# [1] FALSE
list.all(people, "R" %in% names(Expertise))
# [1] TRUE
list.any()
returns TRUE
if at least one of the elements of a list satisfies a given condition, or FALSE
otherwise.
list.any(people, mean(as.numeric(Expertise)) >= 3)
# [1] TRUE
list.any(people, "Python" %in% names(Expertise))
# [1] TRUE
list.count()
return a scalar integer that indicates the number of elements of a list that satisfy a given condition.
list.count(people, mean(as.numeric(Expertise)) >= 3)
# [1] 2
list.count(people, "R" %in% names(Expertise))
# [1] 3
list.match()
filters a list by matching the names of the list elements by a regular expression pattern.
data <- list(p1 = 1, p2 = 2, a1 = 3, a2 = 4)
list.match(data, "p[12]")
# $p1
# [1] 1
#
# $p2
# [1] 2
list.remove()
removes list elements by index or name.
list.remove(data, c("p1","p2"))
# $a1
# [1] 3
#
# $a2
# [1] 4
list.remove(data, c(2,3))
# $p1
# [1] 1
#
# $a2
# [1] 4
list.exclude()
removes list elements that satisfy given condition.
people %>>%
list.exclude("sports" %in% Interests) %>>%
list.mapv(Name)
# [1] "Ken" "Penny"
list.clean()
is used to clean a list by a function either recursively or not. The function can be built-in function like is.null()
to remove all NULL
values from the list, or can be user-defined function like function(x) length(x) == 0
to remove all empty objects like NULL
, character(0L)
, etc.
x <- list(a=1, b=NULL, c=list(x=1,y=NULL,z=logical(0L),w=c(NA,1)))
str(x)
# List of 3
# $ a: num 1
# $ b: NULL
# $ c:List of 4
# ..$ x: num 1
# ..$ y: NULL
# ..$ z: logi(0)
# ..$ w: num [1:2] NA 1
To clear all NULL
values in the list recursively, we can call
str(list.clean(x, recursive = TRUE))
# List of 2
# $ a: num 1
# $ c:List of 3
# ..$ x: num 1
# ..$ z: logi(0)
# ..$ w: num [1:2] NA 1
To remove all empty values including NULL
and zero-length vectors, we can call
str(list.clean(x, function(x) length(x) == 0L, recursive = TRUE))
# List of 2
# $ a: num 1
# $ c:List of 2
# ..$ x: num 1
# ..$ w: num [1:2] NA 1
The function can also be related to missing values. For example, exclude all empty values and vectors with at least NA
s.
str(list.clean(x, function(x) length(x) == 0L || anyNA(x), recursive = TRUE))
# List of 2
# $ a: num 1
# $ c:List of 1
# ..$ x: num 1
subset()
is implemented for list object in a way that combines list.filter()
and list.map()
. This function basically filters a list while at the same time maps the qualified list elements by an expression.
people %>>%
subset(Age >= 24, Name)
# [[1]]
# [1] "Ken"
#
# [[2]]
# [1] "James"
#
# [[3]]
# [1] "Penny"
people %>>%
subset("reading" %in% Interests, sum(as.numeric(Expertise)))
# [[1]]
# [1] 9
#
# [[2]]
# [1] 7