Function Purity
Functions vs. Procedures
Just because you use the function
keyword doesn’t mean you’re creating a function.
A procedure is a collection of operations that you need to do in a program. It can accept inputs. A lot of the times, when you use the function
keyword, you’re really just creating a procedure.
So what is a function? A function must take inputs and return outputs.
However, a function also must always only call other functions. The moment it calls a procedure, it becomes a procedure too.
Getting clear about what makes a function a function matters because something must be a function in order to take advantage of functional principles.
Semantic Relationship Between Inputs and Outputs
Technically, you could make a function like this:
This function would technically be considered a function. However, it’s disingenuous to the spirit of what a function is.
In particular, a function is about the semantic relationship between inputs and outputs. To understand this, we need to go back to functions in math.
This function, if graphed along a coordinate plane, would end up being a parabola (like a U shape).
That coordinate plane is a visual representation of the relationship between inputs and outputs. Every x coordinate is an input, and every y coordinate is an output of that x coordinate. If x is 10, y is 100. If x is 0, y is 0. If x is -5, y is 25.
So if we go back to the return20
example, there is no real relationship between inputs and outputs. In fact, any input will always output 20.
Pro tip: When naming a true function, the best name is one that communicates the semantic relationship between inputs and outputs.
Side Effects
We need to keep expanding our understanding of what a function is though. Technically, if our current criteria were enough, this would be a function:
This “function” technically accepts inputs (defined earlier) and return outputs, and there is a semantic relationship going on. However, this isn’t a function. Why?
A function’s inputs and outputs must be direct. In the example above, they are indirect. These indirect changes are known as side effects.
Note: This applies specifically to function calls, not function definitions.
Types of side effects
Any sort of input/output (printing to console, changing files, etc.)
Database storage
Network calls
DOM
Generating timestamps
Generating random numbers
The above list shows us that side effects are not completely avoidable. The goal instead should be to minimize side effects as much as possible. That’s not because side effects are bad. They’re necessary in any application. However, side effects water down or impurify the benefits of functional programming.
Pro tip: When we have to do side effects, make them as obvious as possible.
Pure Functions & Constants
Pure function definition (so far)
A pure function is basically a function in the spirit of functional programming:
All inputs and outputs are direct
Semantic relationship between inputs and outputs
No side effects
(Function call specifically)
Constants
It turns out that when we say a function’s inputs are direct, that doesn’t mean the values need to be passed as arguments. They can be in the outer scope and still respect functional principles:
As you can see, z
is in the outer scope, but assuming this is the full program in front of us, we know it will never change throughout the lifetime of the program. Therefore, in a practical sense, z
is a constant and therefore a direct input, and addAnother
is a pure function.
But what about const
? At first glance, var
just feels wrong because it can be reassigned, while const
is better to guarantee the value never changes. However, think about this: addTwo
can be reassigned as well, yet we don’t treat it as a problem! That shows that what matters is that variables don’t change in practice (not that they can’t change).
Value of pure functions
When we know there are no side effects in a program, we can analyze a snippet of code without having to worry about any code before it. Pure functions are predictable and reliable.
In the example above, suppose we don’t know if z
is a side effect or not. In order to understand how the program runs, we have to mentally execute all the code before addAnother
just to get up to speed with the state of the program. Only then can we assess addAnother
.
In contrast, if we know z
doesn’t ever change, we can focus on addAnother
alone, trusting that nothing around it is affecting it. And what’s more is that we know addAnother
won’t affect anything else when it executes.
Reducing Surface Area
An even better way to approach the assignment of z
would be to reduce surface area.
In the example above, z
is outside of the scope of addTwo
. So is it a side effect? It turns out that it’s easy to answer that question because all we have to do is look at the lines of code inside addAnother
: does z
ever get reassigned? No, it doesn’t.
We are reducing the surface area of the code that we have to analyze. This pattern is a great way to increase confidence that you’re dealing with a pure function. Instead of having to read everything around addAnother
, we only have to read what’s inside.
Same Input, Same Output
Pure function calls also act in isolation: given the same inputs, we will always get the same outputs.
Here's an example of a function that fails this criterion:
The reason we want same inputs, same outputs is that the reader can trust the code. They can move on without having to chase down if the code works as intended.
Level of Confidence
In the end, function purity is about a level of confidence: it's a scale of degrees, not a binary evaluation.
It doesn't make sense to say, "This is pure" or "This is not pure". It's more accurate to say, "I have a high degree of confidence in this function".
Impurity
If a function is not pure, what are our options?
Leave it impure but make it obvious that it is
Note: This makes sense for side effects you can't avoid like writing to a database
Extract the impurity from the function and make it its own procedure
Example: Before writing to a database, you usually perform computations. Instead of placing the computations and side effect all into one function definition (making the whole thing impure), you can break it into 2 parts.
Contain the impurity (reduce its surface area) so as to minimize its potential effects on other parts of the application
A side effect that affects 5 lines of code is better than one that affects the global scope
Extracting impurity
As stated, sometimes we can extract impurity by taking out the side effects and leaving behind a pure function.
Here's an example of a function definition with computations and side effects all packed together:
What you want to do is refactor the function to extract out uniqueId
and appendChild
, as they are side effects.
Now newComment
is a pure function. Instead of generating a unique ID, it accepts one as an argument. Additionally, instead of manipulating the DOM directly, it returns a DOM element ready to be appended.
Side effects are now in the outer shell, and you have a pure function you can rely on.
Containing impurity
Another approach is to contain impurity within a small surface area of the code.
There are specifically 2 ways to contain impurity:
Wrapper functions
Adapter functions
To illustrate wrapper functions, imagine a function that performs side effects on a global-scoped array by mutating it:
To contain the impurity of mutating numbers
, we can wrap insertSortedDesc
in another function and mutate a copy of the array.
With our wrapper function refactor, all the impurity is contained within the getSortedNums
function. This mean less surface area to worry about!
Adapter functions are more awkward to use, but here's how they work in pseudocode:
Store original global state in new variables.
Copy global state and assign copied version to original variables.
Call impure function, allowing the function to mutate the copy of global state.
Store mutated copied state in new variables.
Assign stored original global state back into original variables.
Return mutated copied state.
In other words, you're setting aside the original global state, replacing it with a copy, allowing the impure function to mutate that copy, and then returning the copy plus setting the original global state back in its place.
Caveat: Adapter functions are much harder to use when there is a lot of global state.
Last updated