Hi Glenn.

I hope you understand that I'm with you on getting the stuff in place
to make Fibre Channel work for VMS--perhaps I didn't come across that
way yesterday.

Also, there is a mechanism that has been discussed that would allow you
to find out whether certain commands have completed. The idea is to have
a log in the device that records the completion status of all commands.
Then an initiator could periodically poll the device and find out what
the status is of previously issued commands. The difficulty is that this
is a moderately large perturbation to the current way things work. An
advantage of it is that it was proposed by a guy from IBM.

However, I'm still confused about your exact difficulty. Suppose we had
the two-phase status you proposed, the first being the Class 2 ACK and
the second being the SCSI status. The ACK indicates that the command
and data were successfully moved to the device, and the good status indicates
that the data was successfully written to the media.

What this allows you to do is issue a command, send the data, wait for
the ACK, then send the next command. My question is, how is this better
than the current situation? Suppose the media write operation fails. At
that point you have a bunch of commands already at the device, so the
cleanup operation is complicated. Isn't this complication exactly the
same as you would get if a deferred error occurred in the current proposal?

Doug.

MAIL> reply
To:     STARCH::HAGERMAN
Subj:   RE: more on tapes etc.
Enter your message below. Press CTRL/Z when complete, or CTRL/C to quit:
The recovery from errors would be complex, and if a command fails and
several more are in the pipe, we would indeed have to wait till all had
finished so's to drain things, possibly sending command(s) to quiesce
the tape, read position, then figure what to do to move it...or
decide the tape was toast and maybe arrange some slightly higher layer
to change tapes. (I'm working on a failover scheme for VMS SCSI
now, which uses what amounts to a streams approach. Once that gets
accepted, it might be extended to other devices than disks and
perhaps turn into a slightly more generic interface that users might
be told about. ["slightly" because it's fairly generic as it is.])

The advantage as I see it to having the early "I got the command"
ack is that we know we need to write the tape sequentially, and
can start that up faster if we get the class 2 ack. Also, wer'll
oopps
we'll keep the I/O processes running till we get a status to report
to the user. If a failure occurs before then, we still have the data
and can retry operations. If on the other hand the status is a bogus
"ok" and a deferred error occurs later, we have by then told the
application that things are OK, because that is what we got from
the device. A deferred error is too late in that the I/O is now
long gone and the app has been told things were OK (and may have
done Lord knows what to the only copy of the data left, thinking
it could get it from the tape again).

Yes, it is possible to just say we won't return the status to the user
or complete the I/O till we do a polling operation on the tape to
be sure the data really got back and then complete the operations. It
is a serious change though, and means that your I/O rate is now
dependent on the clock and on how many operations you can buffer,
and you'd better be able to get the extra commands turned around
even in tight-buffer situations. I also recoil somewhat instinctively
over the notion of having to poll I/O devices to know when I/O is
complete. Seems that you're taking the complexity out of the device
on one hand by using class 3, then putting it back in by requiring
both asynch notice of failure and this polling, and changing the meaning
of the SCSI status in the tape device model (and Lord knows what other
models) all at once. I find this seriously unpleasant. It is less of
an issue with the [gag] Seagate disks that can't generate status of
transmission so fast because it is at least the case that disk
operations are idempotent and can just be retried. (I still think
some error paths will get on average a lot longer than they might
otherwise be because we won't know as much about what might be at
the device and what might be in the fabric, and all we can do is
wait and hope things drain off. We keep track of what is sent and
what is complete, but can't track what's at the device and what's
in the pipe (for fabric, I'm thinking) if no status is available
ever (unless again one adds a polling scheme). Ack of packets is
after all not too important, but acks of sequences could let us
decide that the commands are all at the device, so it's now OK
to tell the device to flush 'em all for stuff like cluster transitions
or when doing some other processing that really needs to have one
initiator (yeah, I know the terminology changed, though I don't
see why this had to be so...) control things exclusively. If errors
never hit, it's not such an issue, but our experience is that they
do at times.

I don't expect tape vendors to say much, since they are perhaps not
used to systems concerns where one tries to develop an abstraction
for the tape that includes the ability to perform error recovery
that may go much further than a single drive can. VMS Backup
scratches the surface of what is reasonable and increasingly needed
in really large shops. Some of the HSM folks get deeper. Key
is the notion that it is NOT OK to say "if the tape fails, the
cartridge is toast and we'll just tell the system that it failed
awhile ago." The data must be preserved and known to be so (recapping
some of my letter today 'cause I'm keeping a copy of this one) so that
more elaborate schemes can be used and errors recovered from to the
best ability of the tapes or the SYSTEMS. Drive vendors tend to be
selling most tapes to small systems (PC class) for tapes as well
as disks, and at that level, yeah, it's ok to junk a cartridge with
a bad spot. If you're running security logs, dbms logs, or the like
to tape, losing the logs can be real trouble. But it's the systems
folks who see that, not the drive vendors. I can imagine some moderately
ugly kludges that could be used if an underlying tape abstraction were
along the lines of "you get notified of failure within maybe 100 tape
records, or 1000, or ..., of the point where the data failure hit."
They'd be ugly, would hurt performance of the system by requiring
additional data streams and management thereof, and would require
some really grotesque rules for handling media (since the valid
length would have to be kept somewhere else). I just don't see why
anyone with a system of scale larger than a few PCs would want to
use such things. I'd rather avoid having to mess with 'em.


Anyhow, the foregoing are my concerns. I hope this all helps you see
where my head is at any rate.

Glenn Everhart