lundi 10 mai 2010

Taking advantage of field splitting.

When you type $x in bash ( or your favorite shell ) you might have noticed that $x may well cover several arguments. For example, if x contains a space, it will expand to at least two fields.

Fields are not to be confused with tokens. tokens are roughly the words you see on your command line, fields delimitation depend on the different expansion mechanisms that were used.

In a nutshell, field splitting happens for non quoted dollar constructs. ( I consider `` as obsolete, harmful and replaced by $() ).

Field splitting can be acted upon via the special IFS shell variable. The basic treatment is to split on every character found in IFS. An important distinction is made between the characters ' ','\t', '\n' and the others. The former are called blank, and will not create empty fields if found consecutively.

For example, If you add ':' to IFS, the string :
A=" ;:ab:c "will expand in two empty fields, followed by "ab" and "c".


But where are fields boundaries taken into account ?
In arguments tables construction, in bash "for X in" constructs, the read command, arrays...

For example, if a string does not contains dash prefixed tokens, you could do :
set $STR
To get tokens in the arguments array.
You can do the same more safely with arrays :
parsed=($STR)
echo ${parsed[1]} # second parameter

Another special thing you can do with IFS is due to its interaction with the special variable $*.
As you know, "$@" and "$*" are magic constructs. Both represents command line arguments. While "$@" is a list of arguments occupying one field each ( which is usually impossible for a double quoted construct ), "$*" is only one word, but as a delimiter between the arguments inside that single field, it uses the first character of IFS.

For example, you could build a string made of pipe separated enumerated tokens that way :

OIFS="$IFS"
A=( $STR )
IFS="|$IFS"
egrep \("${A[*]}"\) file.c
IFS="$OIFS"

You would obtain something such as (token1|token2|...|...).
It can be useful to generate things you would do with a loop or a list method in another language.
It avoids "off by one" errors too.

Of course $x parameter variables are seducing with their simple syntax and associated set and shift commands, but keep in mind people could manage to put dash prefixed string as tokens, thus modifying your bash options... Not good ! (I recommend using - or -- in that case)

Aucun commentaire:

Enregistrer un commentaire