Ken - This is the list of architectural investigations people are doing and the status (as of last updte; have not yet updated it today) that I use to report to Pat and Richard what is going on. Since you have the architecture document, these are the "area" investigations that are going on (when firedrills over CLDs are not being handled instead). Group members are taking these topics very broadly, on the whole, which is just what is needed; the issues that cross topics HAVE to be covered (so that flow control, for instance, interacts with reset handling properly). What I have to synchronize here is that the importrant results get passed to the folks whose areas are most impacted by them and that the final document describes a design that can work and that can be incrementally implemented. I'm working on a "phase 2" document which will incorporate the added detail and harmonize it, and leave space for major component routines to be added. Mind, it is possible to take even what is in existence so far and treat it as a yardstick against which an implementation is to be measured, but this will leave many issues unsettled. That is the rationale for goint to major routine interface specification level. The goals are primarily to make the SCSI code base more maintainable, with secondary goals of improving performance (some adapters, e.g. PKJ's adapter, receive a 25% drop in throughput with the queue manager as implemented in Zeta) and facilitating the adding of new capability. The further goal is to get the new design in time to get some of the first pieces into Zeta (which implies interoperation with much of the existing code base). Non goals are things like implementing CAM, putting complete design into the architecture, or spelling out application in the architecture of every bit of SCSI. The resulting document is intended to be a benchmark and rationale for high level decisions on how SCSI is to be implemented which will allow a design to be gauged for compliance and which will suggest LOP type investigations about what's involved in moving to it in different parts of the subsystem. I have been concerned since day 1 (well, maybe day 2 or 3, but before I had read enough about SCSI to do much) about how the migration is to take place. The rest of the group has not been required to concern itself with the constraints so much, because we wanted to get their thinking in blue sky mode about top level organizational and functional issues. The fact that some of this represents "blue sky" thinking (and had to at its earliest stages) may have been why LOP was not followed. I have been trying to keep people informed, though the recent scheduling issues have been a change, and I tend to want to deal with the immediate problems as well as the longer term ones, though the technical direction we have all been given to date has been to defer new functionality for the time being. Considering how easy it is to have all your time sucked up by firedrills, though, I'm beginning to see why the direction has come down so hard... Glenn Everhart Folks - The following assumes some familiarity with the SCSI architecture document. These are topics which need to be further investigated to fill in details in the architecture. This is a design activity and will require that you think about issues of performance, maintainability, usefulness to customers, etc. The topics here may in some cases need only a couple days, or in other cases maybe a couple weeks to investigate. Please think about the topics and how long you expect to be needed for them. The investigations need to produce more detailed text to go into the architecture at the next level down in details. My hope is that most of these will be relatively short since there is plenty of additional detail to go beyond this. We will parcel these out at or about the Thursday meeting; the idea of passing the list around is that people might have favorite topics or want clarification or further definition. The time before then will give you some chance to seek such. More details of the interfaces at various levels of SCSI and the interfaces for common routines will need to be worked out, but these questions need to come first. The times needed will be negotiated after selections. glenn ----------------------------- SCSI architectural issues needing investigation reports 1. Support of SDTR/WDTR and LUNs. SDTR/WDTR are ID wide, but one gets inquiry data per LUN. How should SDTR be treated (pref. for smart adapters) in terms of the various enabled bits? Ditto WDTR. Force all to be the same? Switch on reselect? Disallow certain configurations? what? The question is what operations should be performed, when, to handle these negotiations correctly and in a fashion to support most devices. (There are comments in existing code about SDTR and these issues which will help.) My guess: week Sue. 1wk 2. The architecture document proposes a super-SCDRP containing command buffers as well as state information and possibly custom packing calls to port drivers to handle very unique packing (i.e., not just copy SCSI command into buffer) as well as some means of telling where the CMD buffer should be located. Are there any hidden issues with doing this that would act as problems? Any reasons such a proposal might negatively impact function or performance? My guess: 3 wks Sue, Glenn. 3 wks 3. How should SCSI data structures be linked together? (Remember SCSI3 is likely to mean larger IDs, LUN numbers, and maybe wider constants.) Moving from one data structure to another is frequent and we need to be sure a scheme in the architecture can handle growth and be efficient. My guess: week Jim. by 9/22 4. Is a single level selector sufficient for matching device "SCSI IQ" or peculiarities within class level? Or are there examples where additional capabilities lists should be maintained per device? Suggest forms for these to take if needed. Can one create a single number (or a single number per VMS function) as suggested in the architecture to be a valid representation of a SCSI IQ, or must more dimensions be used? If so, what? (Can a SCSI device be an idiot savant?) My guess: 2 weeks Rick. Start in 1wk on all 3 5. What parts of flow control can be reasonably handled at class startio and what needs to be done at port level? Would it be more advantageous to just have flow control all handled at port level? The object is to avoid resource exhaustion and strive for some I/O fairness. This involves questions of whether the class busy bits are adequate, how should queue depth and queue full status interact, and at what level, and whether one should (try to) use mode pages to tell how full a TCQ queue is and adapt to the hardware. How to handle the switch to single command vs. TCQ mode is an issue too. (Should one issue bus device reset or some such to stop long operations?) Also an issue: multi-initiator busses. If one can setmode-control quotas one might adapt total quotas so the queues would retain room even though >1 initiator is using the bus. (Since the queue manager is to be part of only those port drivers for "dumb" ports, the current flow control scheme resident in it needs to be revisited.) My guess: 2-3 wks Jim. 1+ wks 6. What do SCSI control chips supply by way of bus quality metrics? Is there any common information that can be captured and made available to users in some fashion about this, or are the control chips so different that basically no common information is conceivable? Bus quality metrics are desirable for diagnosis of field problems, possibly for field tuning of parts of SCSI, and for determining when path failover might be needed...IF it is feasible to obtain any such thing in a reasonable way. My guess: 2 wks Rick. Start in 1wk on all 3 7. At what level can RESET be handled? Is it possible to move such handling all (or mostly: some code would be common with packack handling) down to the top level port code? In general can more specific rules of thumb be given about which errors should be handled at low level vs. handling in class code? The architecture document proposes a rule that states you handle errors in class code unless knowledge of the error condition is complete at lower levels. Can we be more specific about various types of errors? The current use of mount verify to respond to SCSI bus reset makes its use for path failover difficult and imposes performance penalties which have no business being present; handling RESET should be done somewhere within the SCSI subsystem envelope. Since it is a bus-wide condition it would seem logical to do so in the bus level code (i.e., in the port driver.) Question is, is this feasible, and what issues arise from doing such? My guess: 2-3 weeks Buzzy. 3 wks. 8. What SCSI knobs and switches should be made controllable for starters? (There's a good deal in the documents about things to control, but also how does one set profiles?) My guess: week+ Rick. Start in 1wk on all 3 9. What is needed for a driver disconnect capability? (How should one idle a device? What about long I/O operations? ) (Involves disconnect, reconnect, and possibly driver unload as a further option.) My guess: week+ Grace. 1.5 wks 10. How should VMS locate port drivers and initialize SCSI subsystems? My guess: 1-2 wks Grace. 1.5 wks 11. Is there a better way to pass I/O to port level than the current svapte/bcnt/boff one? What general memory management routines are needed to translate addresses? My guess: 2 weeks Buzzy. 4 days 12. How should AEN be handled? (Target mode too.) Best to put any of it into port code (beyond the interrupt recognition)? Should a new class driver be present? My guess: 2 weeks ++ Tom. 2 wks (actually 1wk but busy with Japanese next week after) 13. Time-out. Can anything more general be done than having the timeout set by class level function dispatch? Adapt to devices perhaps? Again how might one store and init a profile in clusters? My guess: 2 weeks+ Jim. 2wks + 14. When should errors be logged and in what form? Can some generic rules of thumb (testable!) be given more than are in the current document for this? [rnote: ring buffers etc.] Mary. 3.5 wks. Statuses, 9/19/95 Sue S. - Sent me 1st draft of SDTR info Jim D. - 240 lines written so far (5-6 pages!) on flow control Rick - Needs adapter documents. Leaning toward using diagnostic ` page SCSI commands to do bus metrics. Buzzy - Reviewing use of mem mgt. and is finding map buffers straight- forward. Designing interface for code to build scatter/gather lists. Grace - 1 page written so far re research on locating port driver. (I had discussions with her after mtg & referred her to Sue to discuss some common issues betw. autoconfig. and SCSI connection setup.) Tom - Starting to write up AEN & target mode. Suggests a followon study of whether anything in SCSI 3 will invalidate the target mode implementation we have. (Group discussion was that target mode is what we have, not really AEN.) Mary - Looked over port drivers. Looking at class driver error reporting now. Finding lots of inconsistency. (In discussions with her last evening I told her she's finding exactly the kind of inconsistency we need to remove which I gather & hope helped her get it clear what we need.) ---------------------- It is mentioned that a means to allow a port driver to delay a very short time & be recalled is needed. No queue mgr in scsi2common means this will be needed. Mention to Jim. Statuses 9/21/95 Sue S - still writing. SDTR doc nearly done. Looking at sources re data structures Jim D - will send me something. However, interruption rate and scsi retrospective interfere; may need to offload some info. Rick L - going OK. Skeletons entered in note file Buzzy R - Going well. Writeup on mem mgt in note file. Thinking about reset. Grace W - still investigating. Has enough basic data. Tom G - Hope to have some writing done by Friday. Nikon CLD a major distraction. Mary Y - Looked over class drivers now. Thinking about the issues. Marge S - putting more comments into port driver book; new draft in a few days. --------------------------------- Statuses 9/26/1995 Grace - done one study; doing the second. Marge Sherwood - making some progress on the port driver book, though that's #3 on her priority list now. Jim Dunham - Updated flow control text some in response to my handwritten notes asking for more normative (as opposed to descriptive of existing code) text. Jim is more willing to discuss such needs verbally than is put down on paper here. However Jim announced he's taking a job in the cluster I/O group. I asked Rick Lord to check over Jim's text and need to get back for discussions. Sue S. - SDTR writeup done; still studying the issues in condensing more port driver inputs into a single call & data structure. Rick L - Done his 3 writeups. Has worked on a SCSI mode program which needs to get checked into Ghost somehow; may need a review. I will look it over with him. Buzzy R. - reviewing drivers for reset handling, but has been involved with qlogic rathole (and has the code in hand). Tom G. - Japanese board arrived along with Tom Y., from DEC Japan, but the board did not work. Attempting to get another rush shipped in from Japan. I told the group that some CLDs will be coming soon (some possibly as early as tomorrow) and that when their current studies are done we need to cross review. In addition Dave Fairbanks needs a code review of PKSdriver code after about 10/10; this code supports scsi clusters, for Ghost. I have a slight extension of Sue's pkcdriver fix that permits disabling SDTR to any device on SCSI busses A-D running on my workstation. Sue has a copy of the code, in case it might be useful to let selected sites get around SDTR problem devices.