Rlyeh, esoterica!
educates by Bast on Monday January 13th, 2025
# All good rituals start with the ctynomicon
import ctypes
# Establish universal truths
THE_ANSWER = 0x2a + (6&6)
# See Without Seeing
THE_WAY = b'''50 68 27 6e 67 6c 75 69 20
6d 67 6c 77 27 6e 61 66 68 20 43 74 68 75
6c 68 75 20 52 27 6c 79 65 68 20 77 67 61
68 27 6e 61 67 6c 20 66 68 74 61 67 6e 2e'''
# Casual blood magic
def we_WEEP(s, o, v):
ctypes.c_char.from_address(o+THE_ANSWER+id(s)).value = v
# Invoke primordial Chaos.
@(lambda dreamers: (globals().update({bytes(int(bytes((THE_ANSWER,*b"X",*i)), 2*4*2) for i in THE_WAY.split()): dreamers})))
def rthyl(a, y):
# Tear back the veil
*(we_WEEP(a, i, b) for i, b in enumerate(y)),
# Trust in your Eyes!
a = "In his house at R'lyeh, dead Cthulhu waits dreaming."
print(a)
# Speak Without Speaking!
@(he_comes:=(lambda Δ=(lambda: he_comes): Δ()))
def Δ():
[globals()[chosen](a, chosen)][:] = *(chosen:=sacrifice for sacrifice in globals() if len(sacrifice.split()) - 1),
# He COMES!
he_comes()
print(a)
Before reading, I highly suggest you give it a run. It's compatible with probably every version of python by now (although do note the usage of a walrus, which was introduced in 3.8 (PEP 572), and the non-standard decorator syntax introduced in 3.9 (PEP 614). While there is some obfuscated text, in a moment you'll see that there simply isn't enough around to be malicious if it wanted to.
Now, with that out of the way, I'm sure you have questions! Most of the trickery is just smoke and mirrors. The second half of the quote is indeed stored within
THE_WAY = b'''50 68 27 6e 67 6c 75 69 20
6d 67 6c 77 27 6e 61 66 68 20 43 74 68 75
6c 68 75 20 52 27 6c 79 65 68 20 77 67 61
68 27 6e 61 67 6c 20 66 68 74 61 67 6e 2e'''
Which, run through a very basic hex to character translation (because it's ascii) gives us:
>>> THE_WAY = b'''50 68 27 6e 67 6c 75 69 20
… 6d 67 6c 77 27 6e 61 66 68 20 43 74 68 75
… 6c 68 75 20 52 27 6c 79 65 68 20 77 67 61
… 68 27 6e 61 67 6c 20 66 68 74 61 67 6e 2e'''
>>> "".join(chr(int(i.decode(), 16)) for i in THE_WAY.split())
"Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn."
Nothing else in the code contains enough entropy (bytes sufficiently random to transform something into something else) to be dangerous.
In hindsight, I probably should have obscured this a little more. Perhaps with an offset, or using a neat little trick with utf-16-be:
>>> a = "Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn." + " "
>>> a.encode().decode("utf-16-be")
'偨❮杬畩\u206d杬眧湡晨⁃瑨畬桵⁒❬祥栠睧慨❮慧氠晨瑡杮⸠'
This works provided your text length is divisible by 2, for any ascii character:
>>> import string
>>> string.printable
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_[pre]{|}~ \t\n\r\x0b\x0c'
>>> len(string.printable)
100
>>> string.printable.encode().decode("utf-16-be")
'〱㈳㐵㘷㠹慢捤敦杨楪歬浮潰煲獴當睸祺䅂䍄䕆䝈䥊䭌䵎佐兒協啖坘奚™⌤┦✨⤪⬬\u2d2e⼺㬼㴾㽀孜嵞彠筼絾\u2009\u0a0dଌ'
>>> _.encode("utf-16-be").decode()
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_[/pre]{|}~ \t\n\r\x0b\x0c'
But why? And how?
Let's take our original code:
>>> a = "Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn." + " "
>>> a.encode().decode("utf-16-be")
'偨❮杬畩\u206d杬眧湡晨⁃瑨畬桵⁒❬祥栠睧慨❮慧氠晨瑡杮⸠'
>>> _.encode()
b'\xe5\x81\xa8\xe2\x9d\xae\xe6\x9d\xac\xe7\x95\xa9\xe2\x81\xad\xe6\x9d\xac\xe7\x9c\xa7\xe6\xb9\xa1\xe6\x99\xa8\xe2\x81\x83\xe7\x91\xa8\xe7\x95\xac\xe6\xa1\xb5\xe2\x81\x92\xe2\x9d\xac\xe7\xa5\xa5\xe6\xa0\xa0\xe7\x9d\xa7\xe6\x85\xa8\xe2\x9d\xae\xe6\x85\xa7\xe6\xb0\xa0\xe6\x99\xa8\xe7\x91\xa1\xe6\x9d\xae\xe2\xb8\xa0'
>>> ' '.join(hex(i)[2:] for i in _)
'e5 81 a8 e2 9d ae e6 9d ac e7 95 a9 e2 81 ad e6 9d ac e7 9c a7 e6 b9 a1 e6 99 a8 e2 81 83 e7 91 a8 e7 95 ac e6 a1 b5 e2 81 92 e2 9d ac e7 a5 a5 e6 a0 a0 e7 9d a7 e6 85 a8 e2 9d ae e6 85 a7 e6 b0 a0 e6 99 a8 e7 91 a1 e6 9d ae e2 b8 a0'
Hm, that's not as helpful as I was hoping it would be. There's definitely a series of patterns here, and everything is in the "high bits". The way this works is by using big endian specifically, we can interpret the utf-8 bytes (which for ascii are always the lower 7 bits) as higher-bit utf-16 bytes. The raw binary for Ph'nglui
is 80 104 39 110 103 108 117 105, or:
>>> 'Ph\'nglui'.encode()
b"Ph'nglui"
>>> [*_]
[80, 104, 39, 110, 103, 108, 117, 105]
>>> [bin(i) for i in _]
['0b1010000', '0b1101000', '0b100111', '0b1101110', '0b1100111', '0b1101100', '0b1110101', '0b1101001']
>>> 'Ph\'nglui'.encode()
b"Ph'nglui"
>>> 'Ph\'nglui'.encode().decode("utf-16-be")
'偨❮杬畩'
>>> 'Ph\'nglui'.encode().decode("utf-16-be").encode()
b'\xe5\x81\xa8\xe2\x9d\xae\xe6\x9d\xac\xe7\x95\xa9'
>>> [*_]
[229, 129, 168, 226, 157, 174, 230, 157, 172, 231, 149, 169]
>>> 'Ph\'nglui'.encode().decode("utf-16-be")
'偨❮杬畩'
>>> [ord(i) for i in _]
[20584, 10094, 26476, 30057]
So we go from 8 characters to 4 characters. Big endian implies that the big part of the number comes first. And since we have half the number of characters, this suggests that the lower half…
>>> 20584 - 104
20480
>>> # suspiciously round number
>>> 20480 >> 8
80
Yep. So our transform is to take the bits of our first character in each set of two, right shift by 8, and add the second (converting them into a 16 bit number). Then encode and decode from the utf-16 format. A side question is what this has to do with 229 129
–that's utf-8 encoding:
>>> len(b'\xe5\x81\xa8\xe2\x9d\xae\xe6\x9d\xac\xe7\x95\xa9')
12
50% longer than even the raw utf-16 bytes. This is because we re-encoded the utf-16 variant text as utf-8 bytes, and utf-8 uses "extender" bytes to pack everything in the bottom 7 bits. Our original 80 104
(two bytes because utf-16 read both bytes as one code point) becomes 20584, which then becomes 偨 (CJK Unified Ideograph-5068, or cī). The utf-8 encoding of 偨 is:
>>> "偨".encode()
b'\xe5\x81\xa8'
>>> [*_]
[229, 129, 168]
229 129 168.
The actual construction is interesting too (and hopefully you don't mind this extensive rabbit hole I've been digging.) We know this is a three-byte encoded utf-8 code point, which suggests that the first byte is masked with 1110xxxx
, and the second and third with 10xxxxxx
. We can tear this apart manually:
>>> a = "偨".encode()
>>> a
b'\xe5\x81\xa8'
>>> [*a]
[229, 129, 168]
>>> # First byte
>>> a[0] ^ int("11100000", 2)
5
>>> bin(5)
'0b101'
>>> # Second byte
>>> a[1] ^ int("10000000", 2)
1
# Third byte
>>> a[2] ^ int("10000000", 2)
40
>>> bin(_)
'0b101000'
>>> # Combine:
>>> f"{5:>04b}{1:>06b}{40:>06b}"
'0101000001101000'
>>> int(_, 2)
20584
>>> chr(20584)
'偨'
And so that's how that works! It's pretty cool. Unfortunately I'd have to rework a decent portion of the original script and mention utf-16-be in it to use it. And that would ruin some of the fun. It's supposed to be hiding in plain sight, after all, and behind an encoding that munges the text isn't quite plain sight. If you're experienced with hex or debugging you can probably read the text directly through it's current encoding, which is the entire point! The magic! The theming!
It's right in front of your face the whole time and you just don't realize it because you haven't dug deep enough into the rabbit hole yet :P
# All good rituals start with the ctynomicon
import ctypes
I am slightly annoyed by this comment. It should be read "cytonomicon", but written ctyponomicon. It doesn't work too well, but I left it munged in the hopes that it would be mentally read correctly despite the mismatch.
# Establish universal truths
THE_ANSWER = 0x2a + (6&6)
>>> 0x2a + (6&6)
48
48 is the answer: the byte offset from the start of the header of a string in cpython to the juicy string data within it. But I couldn't help but make it out of 0x2a (42, thank you early years of hhgttg), and 6&6 (because we are supposed to do something cursed here, and an almost-but-not-quite triple six fits perfectly.) The parenthesis are mandatory: & binds too weakly, and would turn this into (0x2a + 6) & 6, which is 0.
# Casual blood magic
def we_WEEP(s, o, v):
ctypes.c_char.from_address(o+THE_ANSWER+id(s)).value = v
Purveyors of python memetics will spot the trick in here:
def overwrite_string_byte(offset, object, value):
ctypes.c_char.from_address(id(object) + 48 + offset).value = value
But it's all dressed up nice and evil! Crying tears of bytes, whispering about the answer to life, the universe, everything, and header offsets, and being cryptically abrupt with parameters in the wrong order. For the uninitiated, ctypes.c_char.from_address
takes a pointer address in memory (wherever it is) and treats it as a c CHAR type, which is generally a single byte. POSIX fixes it to 8 bits, and here we're assuming it's unsigned (and from experience, this is pretty consistent across platforms). So, essentially, this is a write-what-were primitive.
# Invoke primordial Chaos.
We'll get back to that later :3
# Trust in your Eyes!
a = "In his house at R'lyeh, dead Cthulhu waits dreaming."
print(a)
Do you trust your eyes? Very simple. Assign a string to a
. Print it. And it does what you would expect. The first line of program output is, indeed:
In his house at R'lyeh, dead Cthulhu waits dreaming.
# Speak Without Speaking!
@(he_comes:=(lambda Δ=(lambda: he_comes): Δ()))
def Δ():
[globals()[chosen](a, chosen)][:] = *(chosen:=sacrifice for sacrifice in globals() if len(sacrifice.split()) - 1),
What?
Yes, most unicode characters are valid names in python. The restrictions are, if I remember correctly, based upon the character classes, so you cannot use numerics like ٦
(sitta, 6) as variable names:
>>> ٦
File "<stdin>", line 1
٦
^
SyntaxError: invalid character '٦' (U+0666)
>>> ٦ = 1
File "<stdin>", line 1
٦ = 1
^
SyntaxError: invalid character '٦' (U+0666)
>>> int("٦")
6
This is an interesting and potentially footgunny behavior from int() if you expect it to behave only on 0-9. Also, U+0666. But I digress (more than usual).
And the final lines:
# He COMES!
he_comes()
print(a)
The reveal! What does this print? Well, this is a magic trick after all.. You would expect the same as above (In his house
…). After all, we haven't changed a
at all. But that's not what happens.
In his house at R'lyeh, dead Cthulhu waits dreaming.
Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn..
Now you probably have enough already to put it all together, which is why I told you to run it first before you spoil it! But there's plenty more to the trick.
Like this part:
# Speak Without Speaking!
@(he_comes:=(lambda Δ=(lambda: he_comes): Δ()))
def Δ():
[globals()[chosen](a, chosen)][:] = *(chosen:=sacrifice for sacrifice in globals() if len(sacrifice.split()) - 1),
What the f#%k does this do?
Lets go piece by piece.
@(he_comes:=(lambda Δ=(lambda: he_comes): Δ()))
@
at the start of a line: this is a decorator, a line of code to be called with the next function definition as an argument. Primarily used to wrap or manipulate functions as first-class objects, we have.. abused it here. Stripping the outer parenthesis and the initiator (@):
he_comes:=(lambda Δ=(lambda: he_comes): Δ())
It's a walrus expression, which assigns the global/module level name he_comes
to the result of.. something. The original code has no function def named he_comes
–this is where it comes from. It's being assigned here.. but also returned by the decorator (because a walrus both assigns and returns), leading towards this function, whatever it is, being under the global names of both Δ
and he_comes
.
Unwrap another parenthesis:
(lambda Δ=(lambda: he_comes): Δ())
lambdas bind late. When this code is executed/prepped, he_comes
has no value. Accessing it/running this code would result in a NameError. But it isn't run because it's a lambda, and so this refers to itself in a beautiful and oddly twisted circle:
@(name_one := (lambda temporary=(lambda: name_one): name_two()))
lambda name_two=(lambda: name_one): name_two())
@(he_comes:=(lambda Δ=(lambda: he_comes): Δ()))
# This is a no-op. Same as `lambda: value` but with early binding
lambda x=(lambda: value): x()
after this, both he_comes
and Δ
refer to the same function, the same lambda. This lambda has Δ
bound as it's parameter (internally only), which is another function for which it just wraps and then unwraps he_comes (but after storing it as a early-evaluated default parameter). I love the inscrutability of it. But it doesnt' do much other than the name swap needed for the real meat.
# Invoke primordial Chaos.
@(lambda dreamers: (globals().update({bytes(int(bytes((THE_ANSWER,*b"X",*i)), 2*4*2) for i in THE_WAY.split()): dreamers})))
def rthyl(a, y):
# Tear back the veil
*(we_WEEP(a, i, b) for i, b in enumerate(y)),
Another decorator, another lambda. This one accessing (setting) global variable names by string name. We can tear them apart somewhat:
{
bytes(int(bytes((THE_ANSWER,*b"X",*i)), 2*4*2) for i in THE_WAY.split()): dreamers
}
bytes(int(bytes((THE_ANSWER,*b"X",*i)), 2*4*2) for i in THE_WAY.split())
This is our decoder from above. For each item in THE_WAY, space delimited (as the characters are) do… int(bytes((THE_ANSWER,*b"X",*i)), 2*4*2)
. Gesundheit.
2*4*2 is easy. That's 16. Ah, this is a hex decode. int(something, 16)
. What are we decoding.. bytes((THE_ANSWER,*b"X",*i))
.
Well, back in we go.
It's got two outer layers with (())
surrounding, meaning we're passing a tuple into bytes. That tuple is also important because we're passing.. THE_ANSWER
– 48, the header offset. Oh, now you might catch what this is. This is the arguments to we_WEEP. *b"X"
is unpacking again. Bytes objects are iterable as the integers within them. So we're putting:
>>> char = b"R"
>>> bytes((48, 88, *char))
b'0XR'
Huh. What does 0XR have to do with anything?
Does 0xR
catch the eye better? Python doesn't require the x
in a hex literal to be lowercase. We're turning each sequence of two characters (composing each byte of the obscured text) into hex literals, then interpreting them with int(x, 16)
into the actual integer values of each code point. And then one final bytes()
call on the sequence to decode it into a bytes literal. globals()
accepts bytes here, so we're actually creating a global variable with the name `In his house at R'lyeh, dead Cthulhu waits dreaming.`
spaces and all, with a value of dreamers
: the function passed into the lambda, or the def rthyl
.
A cheeky comment later, and we have the remaining piece of the puzzle:
*(we_WEEP(a, i, b) for i, b in enumerate(y)),
A self-unpacking iterable, *(x for x in y),
, the comma required to make it unpack properly. Un-packed and un-cursed:
def rthyl(target, string):
for index, byte in enumerate(string):
we_WEEP(target, index, byte)
Which simply writes the bytes from the input string to the target. The target, of course, presumably being a
–our globally saved string. But how does it get there? How is this even called, a function stored in a variable with spaces, capitals, and periods?
@(he_comes:=(lambda Δ=(lambda: he_comes): Δ()))
def Δ():
[globals()[chosen](a, chosen)][:] = *(chosen:=sacrifice for sacrifice in globals() if len(sacrifice.split()) - 1),
The decorator is mostly a no-op, but also saves it under he_comes
, which is our provided entry during the "simple" part of the trick. So the question is what does this cursed line do. Let's start by breaking it into pieces.
First, we've got two halves. The left side is [globals()[chosen](a, chosen)][:]
(gesundheit). The right side is another cloaked unpack iteration:
for sacrifice in globals():
if len(sacrifice.split()) - 1:
chosen = sacrifice
It's selecting a variable name (iterating over globals() gets you all global variable names) by searching for the last one with a space in the name. In a regular program this would never assign to chosen
, because you can't have spaces in variable names. But we assigned to globals()
during our initialization of rthyl
while the decorator was running. So there is one. It has significantly more than one space, but only more than one is required (.split() - 1
is not zero. .split()
on an element with no matches returns the element unchanged.). And so once the right side executes (first) we have:
chosen = "Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn."
[globals()[chosen](a, chosen)][:] = list(chosen)
globals()["…"]
is rthyl
from above. So this translates to rthyl(a, chosen)
. Replace the global name a
with chosen
, as within he_comes
a
is not bound. The [][:]=list()
framing is a no-op other than ensuring we can run two things in the correct order in one appropriately cursed line.
And so, there you have it. It abuses decorators, unpacking, names, globals, and other neat tidbits to overwrite the memory a
utilized to store the original string and transmute it into the new one.
You may have also noticed that the actual output does not 100% match the two strings: there's an extra period. That's because the replacement is one character shorter and thus the extra .
is from the original, straightforwardly printed string. I could have added a space to the end of THE_WAY
to overwrite it, but I like the double period. Something so cursed shouldn't also be so clean. It wouldn't be appropriate.