Formatting data for excel copy/paste

help by Bast on Thursday December 26th, 2024

This one is short and sweet. I was working on a quick script, and wanted to know how Excel's auto-formatting function worked to determine if I could write my output to the terminal, highlight, and then copy-paste into an excel window.

In brief, 10/10 hindsight, it should be obvious. Print it out in TSV or CSV format. The search results are full of people suggesting "write a CSV/TSV file and import that" and they're right, but nobody tells you why.

It's because copy-paste doesn't work.

And it's not even consistent. I spotted one of them, figured "might as well try tabs instead of searching more", copied from pycharm, pasted into excel, and it worked. And then I ran it in the builtin mac Terminal.app, copy paste… and it's all in one column.

What?

Why does that matter? Well, pasting from the terminal also includes styling information, for some reason. My terminal defaults to black-on-white, but then I paste that into excel and I get this near-invisible off-grey on white. So it pastes.. the text color but not the background?

In hindsight, writing to a file makes a lot more sense. Can't screw up copy-paste from a text file. But apparently you can screw up copy-pasting from a terminal. There's a lot more weirdness present: if you copy-paste TSV data from and then back into the terminal…

And it works… if it's only one line.

What?

Go on. Try it yourself:

a = "0    0"

vs

a = """
0    0
1    2
"""

This isn't consistent across terminals: Terminal.app breaks. The pycharm embedded terminal functions (but pastes newlines-as-part-of-a-line when you do, which is interesting. Looking into readline brings up details about bracketed paste mode–which is a way of indicating to/from a terminal how to wrap paste events so they can be interpreted correctly (which was related to a variety of issues in the python 3.6 era were multiline pastes into the repl were interpreted as a single line despite containing newlines).

>>> a = """
… 0    0
0    0
… """

This is how it's represented in pycharm, when you paste. As you can see, there is a missing on that second line, which indicates python didn't receive it separately. On another, different note.. even with my editor configured into tabs mode (else it would replace all the tabs in this post with four spaces) copy-pasting from said in-application pycharm terminal replaced the tabs in the content I had directly pasted.. with spaces.

This is why it's better to just use spaces. Instead of trying to fight this madness. But we were here because we were hoping for better behavior from excel.

Tested with this code:

for i in range(5):
    print("\t".join(str(i * n) for n in range(3)))

Not overly pretty code, but pasteable. Even in one line.

Using pycharm terminal tab (set to accurate mode, because I needed that for curses at some point): outputs space-delimited elements despite printing tab characters. Copy-pasting results in spaces, and separated numbers but still one cell wide excel results.

Using the "run python" tab that presumably does not embed Terminal.app or however they're doing it, the spaces are not converted, and copy-pasting works as expected and desired (!).

Running it in terminal.app results in broken rich paste behavior.

iTerm2 (the other emulator I use on occasion) prints tabs correctly, and copies into excel correctly. However, copy-pasting the tabbed output back into the same-terminal python console results in the tabs being stripped (bracketed paste off and tabs being interpreted as autocomplete requests? It does ring the bell in both terminals…).

This is all on macos under python 3.12. Both terminals should support bracketed paste mode, so it would be python not utilizing it properly.

But not all of this mess is python's fault. Comma separated items would avoid the entire issue, but aren't parsed by my version of excel (2019, which is only just growing old now..) at all. Pasting output from the terminal into other applications that do rich paste, like textedit, bring over all the formatting, not just the text color. So excel is also responsible for stripping the background shade but not the foreground shade. Or you can selectively paste without formatting, but that just pastes the tab separated data into one column again..

It's a nightmare. It only worked for me the first time because of sheer chance. So yes, the answer is, in fact, to write it to a file and import it. Because otherwise you're about to have fun™.

The CSV module documentation is a little hairy, but this ought do the trick:

data = [(1, 2, 3)]

import csv
with open("output.csv", "w") as f:
    csv.writer(f).writerows(data)

writerow works individually if you want a print() analogue.

Dictwriter is more verbose to work with:

data = [{"a": 1, "b": 2, "c": 3}]

import csv
with open("output.csv", "w") as f:
    d = csv.DictWriter(f, ["a", "b", "c"])
    d.writeheader()
    d.writerows(data)

but ultimately not that problematic. Lazy helper function:

data = [{"a": 1, "b": 2, "c": 3}]


def lazy_dcsv(data: list[dict], filename: str):
    if not data:
        raise ValueError("Empty data")
    import csv
    with open(filename, "w") as f:
        d = csv.DictWriter(f, data[0].keys())
        d.writeheader()
        d.writerows(data)

If I wasn't lazy I'd do write-rename swapping.