Aliasing? In my Python?
educates by Bast on Sunday January 12th, 2025
Recently I was asked by someone about improvements to one of their leetcode solutions. Amidst other things (leetcode does not, in fact, run your code on an isolated machine and so runtimes are quite rough approximations), I spotted a potential footgun, and it led into a rather interesting rabbit hole that finally got me motiviated enough to write a post about it.
So, here we are:
The Problem
The relevant code:
class Solution:
def addTwoNumbers(self, n1: Optional[ListNode], n2: Optional[ListNode]) -> Optional[ListNode]:
carry = 0
result = pre_head = ListNode(0)
while n1 and n2:
carry, sum = divmod(n1.val + n2.val + carry, 10)
result.next = result = ListNode(sum)
n1, n2 = n1.next, n2.next
if not carry:
result.next = n1 or n2
return pre_head.next
while n1:
carry, sum = divmod(n1.val + carry, 10)
result.next = result = ListNode(sum)
n1 = n1.next
while n2:
carry, sum = divmod(n2.val + carry, 10)
result.next = result = ListNode(sum)
n2 = n2.next
if carry:
result.next = ListNode(1)
return pre_head.next
Solution altered to protect the guilty/innocent. I also haven't bothered to hook up a test harness, so it's not necessarily correct or functional (whether originally or due to my edits).
The part we'll be looking at is this part, the carry
shortcut:
if not carry:
result.next = n1 or n2
return temp.next
What this code does is, once you've consumed to the end of one or the other linked-list-numbers, if you don't have a carry bit then you don't actually have to keep adding: the remainder of the result will simply be an exact copy of whichever linked-list-number still has items remaining, or empty if neither do (and are null/None). It's an easy optimization to simply direct the end of your result LLN at the appropriate spot in the input and return.
It's also, interestingly, a rather potent footgun.
consider the following code:
linked1 = read_linked(input("First number:"))
linked2 = read_linked(input("Second number:"))
linked3 = addTwoNumbers(linked1, linked2)
linked1.set_at(3, 5)
result = addTwoNumbers(linked1, linked3)
print("Result:")
print_lln(result)
It looks rather innocent. We read in two numbers as linked lists of digits. Add two together. Change one of the digits, add that again, then return.
But if our numbers are, for example, 10000 and 1 respectively, then we hit a bug. The above routine should return 10001 + 10501 = 20502. But it returns 21002.
But why? And what does this have to do with Rust (I had to :P).
In fact, this is only technically a bug. If the documentation were to state that linked-list-numbers were never to be re-used and were to be manually copied, it wouldn't happen (clue #1). Second, if you were to write this in rust, it would refuse to compile (clue #2).
Third, that result sure looks like we doubled the third digit we set to 5, too (clue #3).
It's also enlightening to print out what the state of our two numbers are before that second addTwoNumbers() call:
linked1 = read_linked(input("First number:"))
linked2 = read_linked(input("Second number:"))
linked3 = addTwoNumbers(linked1, linked2)
linked1.set_at(3, 5)
print("Linked1:")
print_lln(linked1)
print("Linked3:")
print_lln(linked3)
Wait. Why is linked3
10501
? We called set_at
on linked1
, not linked3
?
I spoiled this in the title, and the slug, and the preamble. Sorry.
It's an aliasing bug. Linked1 and linked3 are not independent, because the end of linked3 is literally the same nodes (the same objects) as in linked1. And so changing them changes both numbers. We don't have two numbers, we actually have one and a half. They have been aliased.
This is the flaw I brought up. Unexpected aliasing is a potent footgun that lurks in many codebases. In rust, this will refuse to compile: either linked1
is not mutable (at which point you are forced to manually copy it after it's been moved into addTwoNumbers
the first time, and thus we can be sure that there is only one reference to the tail end of linked3, as the original linked1 is destroyed) or it is mutable, and rust rejects the code again because you have two mutable pointers to the same value (which is verboten again).
The two errors are the common headaches when it comes to writing rust: "value used after move" and "cannot borrow as immutable because it is also borrowed as mutable". They are scourges of beginners and experienced devs alike, standing steadfastedly in the way when you just want the code to work already.
But they're standing in the way of you and a footgun. There exist many different routines for which two inputs being the same object in some capacity lead to unexpected results:
a = {"a": 1, "b": 2}
b = a
def merge(a: dict[str, int], b: dict[str, int]) -> None:
"Merges b into a, appending [pre]-merged[/pre] to all keys already in a."
for key, value in b.items():
if key in a:
key += "-merged"
a[key] = value
This blows up (infinite loop). Why? Because dict.items() here is a view, and changes as the dictionary does. And since a is b, every key in b is in a. And thus for every item we add a new key to both b and a, infinitely expanding the amount of work we need to do and the amount of -merged
's appended to our keys.
merge()
is not alias-safe. We could add a guard clause, of course:
def merge(a: dict[str, int], b: dict[str, int]) -> None:
"Merges b into a, appending [pre]-merged[/pre] to all keys already in a."
# Guard against aliasing
assert a is not b
for key, value in b.items():
if key in a:
key += "-merged"
a[key] = value
or we could make a copy explicitly:
def merge(a: dict[str, int], b: dict[str, int]) -> None:
"Merges b into a, appending [pre]-merged[/pre] to all keys already in a."
for key, value in list(b.items()):
if key in a:
key += "-merged"
a[key] = value
But these are both slower. And crucially, that is what's wrong with the original leetcode submission too. It's simply faster to not do the copy–but you must do the copy at some point, and it's not always clear when you do. Ultimately, you have to decide:
Do you:
- copy every time you pass into a function?
- copy every time you receive data and you might alias?
- copy every time you receive data and you break if there's aliasing?
- give in to the insanity and sacrifice a goat on the altar under the full moon, chanting latin and wielding an ancient blade?
Ultimately, the speedup doesn't particularly matter for this segment of code: For sanity, the copy has to be made somewhere, because the cost of managing where you must make a copy and where you shouldn't for speed is high and unintuitive. A general routine like this simply isn't safe if it results in potential unintended aliasing because it's a double-blind: not only can someone not realize that a function has aliasing forbidden, they might not even realize they are doing it at all. And since it's value-dependent on top of that, it might even dodge light testing or trigger issues far from the original callsite.
In short, it's the worst kind of bug.
The original codebase can be somewhat simplified by removing that edge case:
class Solution:
def addTwoNumbers(self, n1: Optional[ListNode], n2: Optional[ListNode]) -> Optional[ListNode]:
carry = 0
result = pre_head = ListNode(0)
while n1 and n2:
carry, sum = divmod(n1.val + n2.val + carry, 10)
result.next = result = ListNode(sum)
n1, n2 = n1.next, n2.next
while n1:
carry, sum = divmod(n1.val + carry, 10)
result.next = result = ListNode(sum)
n1 = n1.next
while n2:
carry, sum = divmod(n2.val + carry, 10)
result.next = result = ListNode(sum)
n2 = n2.next
if carry:
result.next = ListNode(1)
return pre_head.next
It's clearer now what's going on, because there are no trick cases. We consume a paired iteration, then we burn down each number and move on. This could be made clearer, which I will do in a second but I want to note that we're now also blessed with a singular return statement!
Further improvements include consolidating the burn-down finisher (and, while not done here, stripping away the faux start node):
class Solution:
def addTwoNumbers(self, n1: Optional[ListNode], n2: Optional[ListNode]) -> Optional[ListNode]:
carry = 0
result = pre_head = ListNode(0)
while n1 and n2:
carry, sum = divmod(n1.val + n2.val + carry, 10)
result.next = result = ListNode(sum)
n1, n2 = n1.next, n2.next
remainder = n1 or n2
while remainder:
carry, sum = divmod(remainder.val + carry, 10)
result.next = result = ListNode(sum)
remainder = remainder.next
if carry:
result.next = ListNode(1)
return pre_head.next
Now there's three steps:
- do the addition of digits in both numbers
- finish with any digits left in the bigger number
- if there's a carry digit, add one to the end
There's no aliasing concerns: all items are re-created from scratch in a "pure" fashion.
Consequently, it's not actually about aliasing. Aliasing is a tradeoff between speed and data layout complexity. It's not about purity–even though purity helps with these error classes by ensuring you can't share mutability. It's not about rust either, despite the fact that rust manages to marry mutability rules to a reference scanner (borrow checker) in a way that makes this kind of issue impossible.
It's about clarity of behavior. It's about the fact that most type systems do not have a way to indicate "aliasing". You can't mark a pointer as potentially aliased or not, and mark your functions and data types as containing or not containing them (exceptions exist for LLVM IR, which does include specifications for this, the c restrict
directive, or the nonstandardized __restrict__
in cpp). It's about the fact that even in python you can get this unexpected behavior. It's about the fact that it's not part of a systems language or a dynamic language, it's about how your data behaves: any system capable of maintaining multiple references to a single mutable object is capable of aliasing problems.
It's about expected behavior and safe code. Resist the temptation to alias unless it proves to have great benefits (which is not the case here), as it restricts your further options and can result in unexpected behavior.
And take advantage of language features designed for this: aliasing lists or objects is dangerous.. but only because of mutability.
From a certain perspective, what's wrong with the original test code (not even the addTwoNumbers implementation) is the fact that you can change/mutate a number at all. Unfortunately, sometimes, mutation is the answer.
And so ultimately the reason we shouldn't use a shortcut like that is because the gain is small and the entropic cost is high. Without the shortcut, the data layout is simple and ordered: numbers come in, different number comes out. With the shortcut, numbers come in.. and sometimes we get an attached blend coming out.
Entropy is not just for chemists.