[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: wanted: ARGV standard extension



 I vaguely remember the prior discussions on passing empty args. 
That was right before I did my temporary drop-out from the usenet
scene.  I also remember the method you outlined as being one of the
most robust. My only real objection to it was the complexity of the
code to implement it.  Call me lazy if you will, but after 22 years
of programming, I put a lot of stock in the idea of long design
leading to the simplest possible code.  

 Not that the method is a nightmare of coding by any means, but it
does mean the sender of the args has to make a couple passes of the
data before it can begin writing the args to the environment area (it
has to find the empty args first, since the way of expressing them in
list form requires a variable amount of up-front space).  

 On the receiving end, a process of tokenizing and ascii->binary 
conversion is needed.  I picture the need for something like an
is_in_null_list(argnum) function that scans the ascii ARGV= string,
tokenizing and converting ascii->binary as it goes, and this will
have to be called for any arg that starts with a space.  The empty list
can't be easily binary'd once without setting arbitrary limits on its
size or using dynamic memory allocation.  (IE, it looks like a lot of
the runtime library might get sucked into every program just so that
ARGV args can be processed.)  Runtime performance is a secondary
consideration. Now that I've grown used to having ARGV support
around, I've also grown used to abusing it in makefiles, especially
by doing things like passing 400 object modules names to AR on a
single command line, and so on.  I don't like the idea of making two
passes of 400 args if it can be avoided.

 As I remember it, the last thing I proposed on usenet was a simple
escaping mechanism which was neither embraced nor definitively shot
down by presenting a situation in which it failed catastrophically.
Let me see if I can recall it and present it again in an organized
fashion...

 First, let's consider non-ARGV schemes.  xArgs already deals with
empty args; anything we do doesn't affect it.  Technically, the
basepage image of the command line also allows empty args.  The rule
is that the string is terminated by count, not contents.  A \0 in the
basepage can signal an empty arg without any problems, according to
the standard.  In reality, many implementations use the count byte to
place a \0 at the end of the string, then use strcpy(), strtok(), and
similiar tools to process the image.  An embedded \0 would break
these things.  However, I think the way they'd break is pretty safe --
the program will most likely see fewer args than it expects, and
will thus whine and die.  It isn't likely that the program will break
in catastrophic or data-damaging ways.  It will also be pretty simple
to change existing routines that parse the basepage image to be
driven by count rather than using strtok() et. al.

 That leaves ARGV.  My escaping mechansim can be summed up in one
sentence:  If the first character of any arg is less than or equal to
\1, that arg is prefixed with an extra \1.

 On the arg-sender's side, this is implemented as the data is being
written to the environment data area.  It examines the first char of
each arg string as it is being copied to the env area.  If the char
is <=1, it outputs a \1, followed by the rest of the arg string.

 On the receiving end, this is implemented as the argv[] array is
being created.  The first char of each string is examined, and if it
is \1, the pointer placed in argv[] is incremented by one, so that it
points to the second char of the arg.

 An empty arg is represented in the env data as \1\0.  The \1 is
skipped by the receiver, meaning that the pointer in argv[] will
point to the \0.  An arg of \1 is represented in the env data as
\1\1\0.  The first \1 is skipped, the pointer in argv[] will point to
\1\0.  An arg of \1\2\3 is represented as \1\1\2\3\0; the pointer in
argv[] will be to the second \1.  If the arg is non-empty and
first char is not \1, neither the sender nor the receiver takes any
special action, it works just as it does now. This strikes me as a
general solution that doesn't require multiple passes of the data on
the sending side, or tricky parsing on the receiving side.

 That leaves the issue of how unaware programs will behave, and I'll
admit that's the part I've given the least thought to in this scheme.
I'll brainstorm on the fly here, and rely on the fact that y'all may
spot problems that don't occur to me.

 First let's consider an aware sender and an unaware receiver.  For
an empty arg, the aware sender passes \1\0, and the receiver sees
exactly that.  It will probably react badly to the \1, but probably
not any worse than it would react to a space, I think.  Neither is a
valid filename, and should result in an error message.  I don't know
what other use a program might make of empty args.  A program such as
tr would translate all occurances of \1 instead of the \0 chars you
might have had in mind. But right now such a program can't translate
the \0 chars anyway, there's no way to even ask it to.  A similar
problem arises with trying to pass an arg of \1.  The aware sender
passes \1\1\0, and the receiver might be a bit confused by getting
two chars where it expected one.  But I don't see this as a leading
to catastrophic data loss either.  

 In truth, one of the reasons I like the idea of a \1 as an escape 
is because it strikes me as a char that doesn't often show up in args 
now, and one that is likely to lead to a controlled failure of a 
program that receives it unexpectedly (because it isn't anything like a 
valid filename or option). It might be a valid char to a program that 
searches for or translates string of characters in a file, but it 
shouldn't show up often in such contexts, and should at worst lead to 
the program not finding the strings in the file because of the extra \1.

 Now let's consider an unaware sender and an aware receiver.  I don't
know what an unware sender is likely to do with an empty arg.  If the
sender just puts the \0 into the env area, you end up with \0\0,
prematurely terminating the args, which is just what happens right
now anyway, no change there.  The real problem here is that if an arg
starts with \1, the aware receiver is going to skip that char,
causing a possible screw-up in the receiver's behavior, because it's
skipping something that isn't validly a prefix char.  I'm tempted to
say "so what, it isn't a situation that comes up often enough to
worry about, as per the discussion above on how rare leading \1 chars
in args are."  

 But, if there's a feeling that we should care about this just for 
completeness' sake, then what we need is some extra validation that
an aware receiver can use to determine whether the sender is aware.  
In that case, we can resort to a simple marker passed as the value 
following the ARGV= part of the env.  We need only be careful that we
choose a marker than can't happen in MWC's current use of the ARGV= 
value.  I forget the details of MWC's use of that string, but I'll 
bet something as simple as ARGV=ARGV2\0 would do the trick. Then 
the receiver need only verify the presence of the ARGV2 string, and 
use that as its key on whether to skip leading \1 chars in the args.

 Well, that's my idea and my thoughts on it.  I'll be happy to 
implement (in HSC's runtime library) any reasonable scheme that 
everyone agrees on.  Frankly, with my current worries over the proper 
definition of the GEM programming interface, I won't have a lot of 
energy to spare in lobbying hard for this ARGV scheme.  In both cases,
it's accomodating the widest range of people that interests me most, 
but since I do more GEM programming than CLI-related stuff, that's 
where most of my energy will be going.

 Feel free to redistribute this reply to your mailing list for comments,
or to post it publicly if you feel that's the best forum for feedback
on it.

 - Ian  
 ianl@bix.com  
 ilepore@nyx.cs.du.edu (which just gets forwarded to me on bix anyway)