> Read a CSV file, foo.csv, and sort by the sum of the 4th and 7th columns (whic...

em500 · on Dec 13, 2020

Concretely,

    awk -F, -v OFS=, {print $2+$3, $1, $2, $3}' foo.csv  |  sort -n  |  cut -d, -f2-

enriquto · on Dec 13, 2020

I think this code does not fulfill the task: you need to sum columns 4 and 7, and keep the rest of the data intact in the output.

em500 · on Dec 13, 2020

You're right, that should be

      awk -F, -v OFS=, '{print $4+$7, $0}' foo.csv  |  sort -n -k1 -t, |  cut -f2- -d,

Explanation:

awk -F, -v OFS=, sets the input and output column separator to comma, '{print $4+$7, $0}' outputs the sum of column 4 and 7 before the rest of the line.

sort -n -k1 -t, sorts the file numerically on column 1, with comma separator.

cut -f2- -d, removes column 1, with comma separator.

This is of course not robust for general CSV files, but I don't think OPs marcel is either. A robust solution requires a proper CSV parser.

The biggest warts I see in the classic unix solution is that all the tools use different flags for the field separator.

edit: if you know that the csv doesn't contain tabs, you can omit some flags for a more concise

    awk -F, -v OFS='\t' '{print $4+$7,$0}'  foo.csv | sort -n -k1 | cut -f2-

since sort and cut default to tabs/whitespace as separators. If you're unsure about the contents of the CSV, you really need a proper CSV parser.

geophile · on Dec 13, 2020

> This is of course not robust for general CSV files, but I don't think OPs marcel is either. A robust solution requires a proper CSV parser.

I haven't tested corner cases, but marcel relies on the python csv module, which is probably better than any initial attempt at a parser that I could write in an hour.

This is what I meant about sublanguages. Many people, (myself included), would need to go to the man pages to find the necessary arguments to awk, sort, and cut. I find it much easier to just write a little Python, even if the end result involves more typing.

enriquto · on Dec 13, 2020

At the end, it doesn't look that bad. Of course using the csv format is a bad start in unix. Much better to convert everything to tsv and work from there. In that case the "obvious" shell solution is quite clear.

    <foo.tsv awk '{print $4+$7,$0}' | sort -n -k1 | cut -f2-