Hacker News new | past | comments | ask | show | jobs | submit login
Awk by Example (ibm.com)
255 points by tosh on June 28, 2019 | hide | past | favorite | 21 comments



Just grab the awk book, the awk programming language. https://ia802309.us.archive.org/25/items/pdfy-MgN0H1joIoDVoI...

Yesterday, I wrote production level code using awk. It was either write the code in go which I figured would take all day. But I had really about an hour or so. 40 lines of shell script mixture of about 20 lines of awk and bash/sed/grep/tail/xargs/cut/cat/curl got the job done.

AWK is very freaking powerful if you learn to parse data with it.


Probably a good book, but it serves a way different purpose than this article which is more of a gateway drug for newbies like myself who have never worked with awk before and don't quite understand the value-add.

I really enjoyed this article, because now that I have a sampling of the class of problems that awk can help me with, I am more likely to use it. And it only took 10 minutes to get this overview. My brain is already buzzing with ideas on how I can use this with gron[1] that was mentioned a few days ago on HN.

We all need a compelling reason for a tool before investing the time for whatever Bible may exist, so "just grab" the Bible first instead isn't the most helpful advice :) But I do appreciate you flagging it!

[1] https://github.com/tomnomnom/gron

---

P.S. For those who liked the writing style of this tutorial, you may enjoy these other ones on the IBM website by the same author Daniel Robbins: https://www.ibm.com/developerworks/views/linux/libraryview.j...


wonder if Arnold Robbins (gawk maintainer) and Daniel Robbins are related (wiki tells me Daniel is founder of Gentoo Linux)


Might I ask why this old version (I haven't read it) instead of the manual/book for GNU awk? [1] It does a good job of mentioning differences between other implementations and GNU awk extensions are marked

[1] https://www.gnu.org/software/gawk/manual/


That old version is written by the ‘A’[0], ‘W’[1], and ‘K’[2] that are the eponymous inventors of awk, and the book is at least occasionally cited as basically a masterpiece.[3]

[0] https://en.wikipedia.org/wiki/Alfred_Aho

[1] https://en.wikipedia.org/wiki/Peter_J%2E_Weinberger

[2] https://en.wikipedia.org/wiki/Brian_Kernighan

[3] https://en.wikipedia.org/wiki/The_AWK_Programming_Language


I've also seen this book mentioned elsewhere but never took the effort of reading it. Hopefully I'll go through it sometime soon - it may be ~30 years old, but the core concepts is likely to work in gawk even now.

Also, the gawk book is written by Arnold Robbins who is one of current maintainers of the tool and as per [1] gawk was started by David Trueman and Arnold.

[1] https://www.gnu.org/software/gawk/manual/html_node/Foreword3...


This book is not "an old version". This book is K&R-class. Moreover, the "K" is from the same person in both cases.


Hmm, the vital thing to say is that awk is a series of

   PATTERN { ACTION }
statements, where { ACTION } is optional and defaults to { print }

Side note; I almost always use Perl or Ruby with the -n or -p flags instead of awk. I don't need to have three or four syntaxes to remember when I can use just one.


>where { ACTION } is optional and defaults to { print }

There's more to that:

Either PATTERN or { ACTION } is optional (but not both). If no PATTERN is given, the default means "match every line" (and so do the given ACTION for every line).

The ACTION could be anything (even an explicit print, whether or not PATTERN is given), summing up some column, filtering more on substring(...), assigning values to variables, comparing variables and doing actions based on them (awk's if statement), calling a built-in or user-defined function, etc.


I studied under Al Aho a few semesters ago (the 'A' in AWK). I'm dipping my toes in Data Science in Python at my current internship, and while I'm seriously impressed at the bevy of NLP/Machine Learning algorithms I have at my disposal, I've come to realize that some of the stuff I'm working on can be solved by unix tools written in the 70s/80s. And that's dope.


This point comes up somewhat periodically on HN - the point you made. And yes, it is "dope" :)

More people should check prior art before making new stuff. Reuse / stand on the shoulders of giants / don't reinvent the wheel, etc. But NIH is alive and well ...

https://en.wikipedia.org/wiki/Not_invented_here



Thank you for this, I could not find the follow up to the first article on the IBM webpage.


"And, unlike some languages, awk’s syntax is familiar, and borrows some of the best parts of languages like C, python, and bash (although, technically, awk was created before both python and bash)."

Technically??



P+++


The best kind of correct


If anyone's interested, I created a repo that contains some common one-liners that I've used https://github.com/kirang89/awk-cookbook. There's a PDF version as well :)


In the early 2000s, I spent quite some time learning with the tutorials on IBM developer site. Many of them are well-written and easy to read compared with blog posts. And they paid the authors.


I recently looked for a simple scripting language where I would need to do some mathematical arithemetics, and do some formatted output of the results (2 decimal digits). I found awk scripts (stand alone) a good option. Bash it self is not well suited, and also Python felt like a worse option in my case. Also looked at bc as script language but its reusablity and formatting features were more limited.


Here's another source of good awk examples: http://tuxgraphics.org/~guido/scripts/awk-one-liner.html




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: