Unix Text File Database

Unix usually comes with a set of tools that help you manipulate tab delimited data files. Since I never really bothered to learn these, I figured I’d play with them and take notes.

cat – everyone knows this – concatenates files together. cat t1 t2 outputs t1 followed by t2.

paste – like cat, for columns. paste t1 t2 – if t1 and t2 both contain one column of data each, each row will have t1 and t1 data stuck together. Ex:

 bash-2.05$ cat > t1
 cat
 dog
 bird
 bash-2.05$ cat > t2
 meowmix
 purina
 seed
 bash-2.05$ paste t1 t2
 cat     meowmix
 dog     purina
 bird    seed

cut – the opposite of paste. Extracts specific columns from the input.

 bash-2.05$ cat > t3
 fish    food
 shoe    clothing
 hut     shelter
 bash-2.05$ cut -f 1 t3
 fish
 shoe
 hut

You can also specify characters, and ranges.

 bash-2.05$ cut -c 1-2 t3
 fi
 sh
 hu
 bash-2.05$ cut -f 1-2 t3
 fish    food 
 shoe    clothing
 hut     shelter

comm – report what lines are common between two files. Not sure how to use this yet.

join – like an SQL join. It’s hard to explain, but, here’s a good example.

 bash-2.05$ cat > users
 1       johnk
 2       tarok
 3       yurik
 bash-2.05$ cat > tasks
 1       sleep
 1       fix things
 2       sleep
 2       take bath
 3       sleep
 3       cook
 3       yell at taro 
 bash-2.05$ join users tasks
 1 johnk sleep
 1 johnk fix things
 2 tarok sleep
 2 tarok take bath
 3 yurik sleep
 3 yurik cook
 3 yurik yell at taro

tsort – topological sort. Not db specific, but can be used to analyze graph data. Could be useful for analyzing something. Added here because it’s interesting.

 bash-2.05$ cat > graph
 a b
 b c
 c d
 e f
 f g
 g h
 h c
 d a
 bash-2.05$ tsort graph
 e
 f
 g
 h
 tsort: cycle in data
 tsort: a
 tsort: b
 tsort: c
 tsort: d
 d
 a
 b
 c

The other useful commands are: awk, grep, uniq, sort