Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Read a CSV file, foo.csv, and sort by the sum of the 4th and 7th columns (which are integers):

Yes, that's a much better example. Here the "obvious" shell solution involves using awk to compute the sum of the two columns and putting it as the first field, sorting by the first field, and then removing that field. I guess a "rosetta stone" of such examples (e.g. on the frontpage of the site) would make a strong case for the interest of marcel.



Concretely,

    awk -F, -v OFS=, {print $2+$3, $1, $2, $3}' foo.csv  |  sort -n  |  cut -d, -f2-


I think this code does not fulfill the task: you need to sum columns 4 and 7, and keep the rest of the data intact in the output.


You're right, that should be

      awk -F, -v OFS=, '{print $4+$7, $0}' foo.csv  |  sort -n -k1 -t, |  cut -f2- -d,

Explanation:

awk -F, -v OFS=, sets the input and output column separator to comma, '{print $4+$7, $0}' outputs the sum of column 4 and 7 before the rest of the line.

sort -n -k1 -t, sorts the file numerically on column 1, with comma separator.

cut -f2- -d, removes column 1, with comma separator.

This is of course not robust for general CSV files, but I don't think OPs marcel is either. A robust solution requires a proper CSV parser.

The biggest warts I see in the classic unix solution is that all the tools use different flags for the field separator.

edit: if you know that the csv doesn't contain tabs, you can omit some flags for a more concise

    awk -F, -v OFS='\t' '{print $4+$7,$0}'  foo.csv | sort -n -k1 | cut -f2-
since sort and cut default to tabs/whitespace as separators. If you're unsure about the contents of the CSV, you really need a proper CSV parser.


> This is of course not robust for general CSV files, but I don't think OPs marcel is either. A robust solution requires a proper CSV parser.

I haven't tested corner cases, but marcel relies on the python csv module, which is probably better than any initial attempt at a parser that I could write in an hour.

This is what I meant about sublanguages. Many people, (myself included), would need to go to the man pages to find the necessary arguments to awk, sort, and cut. I find it much easier to just write a little Python, even if the end result involves more typing.


At the end, it doesn't look that bad. Of course using the csv format is a bad start in unix. Much better to convert everything to tsv and work from there. In that case the "obvious" shell solution is quite clear.

    <foo.tsv awk '{print $4+$7,$0}' | sort -n -k1 | cut -f2-




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: