Hi Glenn. I hope you understand that I'm with you on getting the stuff in place to make Fibre Channel work for VMS--perhaps I didn't come across that way yesterday. Also, there is a mechanism that has been discussed that would allow you to find out whether certain commands have completed. The idea is to have a log in the device that records the completion status of all commands. Then an initiator could periodically poll the device and find out what the status is of previously issued commands. The difficulty is that this is a moderately large perturbation to the current way things work. An advantage of it is that it was proposed by a guy from IBM. However, I'm still confused about your exact difficulty. Suppose we had the two-phase status you proposed, the first being the Class 2 ACK and the second being the SCSI status. The ACK indicates that the command and data were successfully moved to the device, and the good status indicates that the data was successfully written to the media. What this allows you to do is issue a command, send the data, wait for the ACK, then send the next command. My question is, how is this better than the current situation? Suppose the media write operation fails. At that point you have a bunch of commands already at the device, so the cleanup operation is complicated. Isn't this complication exactly the same as you would get if a deferred error occurred in the current proposal? Doug. MAIL> reply To: STARCH::HAGERMAN Subj: RE: more on tapes etc. Enter your message below. Press CTRL/Z when complete, or CTRL/C to quit: The recovery from errors would be complex, and if a command fails and several more are in the pipe, we would indeed have to wait till all had finished so's to drain things, possibly sending command(s) to quiesce the tape, read position, then figure what to do to move it...or decide the tape was toast and maybe arrange some slightly higher layer to change tapes. (I'm working on a failover scheme for VMS SCSI now, which uses what amounts to a streams approach. Once that gets accepted, it might be extended to other devices than disks and perhaps turn into a slightly more generic interface that users might be told about. ["slightly" because it's fairly generic as it is.]) The advantage as I see it to having the early "I got the command" ack is that we know we need to write the tape sequentially, and can start that up faster if we get the class 2 ack. Also, wer'll oopps we'll keep the I/O processes running till we get a status to report to the user. If a failure occurs before then, we still have the data and can retry operations. If on the other hand the status is a bogus "ok" and a deferred error occurs later, we have by then told the application that things are OK, because that is what we got from the device. A deferred error is too late in that the I/O is now long gone and the app has been told things were OK (and may have done Lord knows what to the only copy of the data left, thinking it could get it from the tape again). Yes, it is possible to just say we won't return the status to the user or complete the I/O till we do a polling operation on the tape to be sure the data really got back and then complete the operations. It is a serious change though, and means that your I/O rate is now dependent on the clock and on how many operations you can buffer, and you'd better be able to get the extra commands turned around even in tight-buffer situations. I also recoil somewhat instinctively over the notion of having to poll I/O devices to know when I/O is complete. Seems that you're taking the complexity out of the device on one hand by using class 3, then putting it back in by requiring both asynch notice of failure and this polling, and changing the meaning of the SCSI status in the tape device model (and Lord knows what other models) all at once. I find this seriously unpleasant. It is less of an issue with the [gag] Seagate disks that can't generate status of transmission so fast because it is at least the case that disk operations are idempotent and can just be retried. (I still think some error paths will get on average a lot longer than they might otherwise be because we won't know as much about what might be at the device and what might be in the fabric, and all we can do is wait and hope things drain off. We keep track of what is sent and what is complete, but can't track what's at the device and what's in the pipe (for fabric, I'm thinking) if no status is available ever (unless again one adds a polling scheme). Ack of packets is after all not too important, but acks of sequences could let us decide that the commands are all at the device, so it's now OK to tell the device to flush 'em all for stuff like cluster transitions or when doing some other processing that really needs to have one initiator (yeah, I know the terminology changed, though I don't see why this had to be so...) control things exclusively. If errors never hit, it's not such an issue, but our experience is that they do at times. I don't expect tape vendors to say much, since they are perhaps not used to systems concerns where one tries to develop an abstraction for the tape that includes the ability to perform error recovery that may go much further than a single drive can. VMS Backup scratches the surface of what is reasonable and increasingly needed in really large shops. Some of the HSM folks get deeper. Key is the notion that it is NOT OK to say "if the tape fails, the cartridge is toast and we'll just tell the system that it failed awhile ago." The data must be preserved and known to be so (recapping some of my letter today 'cause I'm keeping a copy of this one) so that more elaborate schemes can be used and errors recovered from to the best ability of the tapes or the SYSTEMS. Drive vendors tend to be selling most tapes to small systems (PC class) for tapes as well as disks, and at that level, yeah, it's ok to junk a cartridge with a bad spot. If you're running security logs, dbms logs, or the like to tape, losing the logs can be real trouble. But it's the systems folks who see that, not the drive vendors. I can imagine some moderately ugly kludges that could be used if an underlying tape abstraction were along the lines of "you get notified of failure within maybe 100 tape records, or 1000, or ..., of the point where the data failure hit." They'd be ugly, would hurt performance of the system by requiring additional data streams and management thereof, and would require some really grotesque rules for handling media (since the valid length would have to be kept somewhere else). I just don't see why anyone with a system of scale larger than a few PCs would want to use such things. I'd rather avoid having to mess with 'em. Anyhow, the foregoing are my concerns. I hope this all helps you see where my head is at any rate. Glenn Everhart