From scott@quake.Stanford.EDU Fri Jul 26 17:43:15 1996 Received: from quake.Stanford.EDU by mdisas.nascom.nasa.gov via SMTP (940816.SGI.8.6.9/930416.SGI) for id RAA01403; Fri, 26 Jul 1996 17:43:12 -0400 Received: by quake.Stanford.EDU (5.65/25-QUAKE-eef) id AA04922; Fri, 26 Jul 1996 14:43:08 -0700 Date: Fri, 26 Jul 1996 14:43:08 -0700 (PDT) From: Scott Williams To: mdiers@mdisas Subject: Updated Monitoring Rules Message-Id: Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: R I know that people have been eagerly awaiting these rules so I am emailing out a DRAFT DRAFT DRAFT set which has not been thuroughly reviewed and which includes some policy proposals which have not been uniformly accepted. This list also does NOT include some of the updated chaining information. This will require an update to the hkgse database and some slight mods to the screen 12 definition which are presently under development and review. If interested in the modified screen 12 a working DRAFT version will soon be available as screen 12d. Please feel free to comment on these monitoring guidelines and the screen 12 explanation. CAMPAIGN PERIOD MONITORING -------------------------- The verification of MDI nominal flight status will continue during the campaign period. This will be done twice in the day by the EOF staff (AM and PM) and once a day from California in the evening. The person responsible for monitoring should verify the contact time to determine the best time to perform the check(s). The MDI Health Monitoring web page includes a link to a DSN schedule. The web page includes these and other directions, plots and displays of significant housekeeping parameters and recent sci5k and sci160k downlinked images. This web page is located at: http://mdiems.nascom.nasa.gov/health_mon/ All of the text displays included on the web page can be reached directly on mdisas or mdiems. The DSN contacts and MDI monitoring schedule can be found at: /mdisw/plan/contacts/upcoming_soho_schedule.txt /mdisw/doc/monitor.info/monitoring.schedule When to Wake People The procedures on how to contact the MDI EOF representative in case of a telemetry loss are also applicable if there is concern about the instrument health or if the structure program is not running. Once the telemetry stream has been verified the MDI EOF Office (301-286-3233 or 9054), MDI EAF Office (301-286-3251) and MDI Pager (see phone list) should be called in that order. If the monitoring is being performed during a long pass then the verification should include checking that the correct campaign is running as well. Of course one should use their judgment, the checks should be made when there is still time to do something about it. If the pass is about to end and the next contact isn't until the morning, one could probably wait until the morning to start calling. A voicemail can be left on the MDI EOF phone at 9054. MDI Flight Status Verification A compact screen has been developed which contains all the required information and which can be displayed on simple terminals as well as via X window. The MDI Flight Status display (screen 12) can be invoked in a VT100 (curses) compatible terminal with "screen 12 -t" or as a X window display with "screen 12" from either mdisas or mdicmd and will update in realtime. A snapshot of the screen 12 display is available on the web page and is updated every 10 minutes. Screen 12 allows the following verification functions to be performed: 1) The host machine is connected to the ECS and is receiving telemetry. 2) MDI power and thermal environment is nominal 3) The limb tracker (ISS) is closed loop and operating nominally 4) The DEP and application electronics are running 5) The DEP sequencer is running 6) The DEP registers are configured correctly for the dynamics sequence 7) The IP program is running nominally If telemetry is not being received at the time the screen program is invoked the screen will be displayed with blanks for all the telemetry fields and the TM_STATE: REQ ACKN. If there are already more than 32 screens up or if the EGSE is having a problem then the error might be: ************************************************** SCREEN: Error on connect, not accepting connections ************************************************** If the EGSE is no longer running the message will be: ************************************************** SCREEN: Could not get MAIN shared memory partition, errno: 2 ************************************************** In any of these cases, one should also try logging on to mdicmd and running the screen from there. The last valid screen 12 can be viewed by: cat /md81/log/screen12/newest The same directory holds all of the 10 minute snap shots and the most recent can be reviewed. The naming convention is yymmdd.hhmmss, e.g. 960726.152003 for July 26, 1996 at 15:20:03 EDT. If the data is flowing to mdicmd and the instrument is healthy, the files can be transferred later to mdisas so this does not constitute a problem that requires waking people. If one still can't see any telemetry follow the procedure on mdisas for loss of telemetry on the web page or in /mdisw/doc/notelemetry.procedure. The subsystems listed above can be verified as follows: - Check the telemetry status. * Verify TM mode is MEDIUM and TM_STATE is RUNNING. * Verify that SVM, HK1 and HK2 packets are updating once every 15 seconds. * Verify that the sci5k is incrementing by 1 to 4 packets every second. * If VC2 channel is active verify HR MODE is FILL, RAW or COMPRESSED, that a valid DPC is displayed and that the sci160k packet count is incrementing (A list of valid DPC codes can be found on the LPARL Ops web page or in /mdisw/dbase/cal/info/dpc_table.txt). - Check that health parameters are within exceptable limits given in the display 12 screen explanation on the web or in: /mdisw/doc/monitoring.info/healthpage.monitor * Verify power status (ON), current draw and voltages are in range and are consistent. * Verify temperatures are in range and consistent and that Oven heater status is ON. - Check limb tracker (ISS) errors and loop state * ISS status should be closed * LT PZTs should be "IN RANG" and voltages should be near 2. * LT average errors should be around 2.3 (2.1 to 2.5 is fine) - Confirm that the DEP and application electronics are running * The validity of health checks will indicate that the application electronics are running and that the DEP is handling telemetry. * Verify that the LOBT is updating * Verify 5k Mode is 03h and that 5k Status is valid (first digit should not be 4-7, c-f and second digit should not be 0,7-f. - Confirm that prime sequence is running * SEQ ID should begin with C4 during dynamics or structure but may be C6 or C7 during campaign. * Seq Status should be 03, 04 or 05 * Seq Error should be 0003 * Int Counter should be incrementing (1 to 96) * Frame No. in Seq should be incrementing * OBS Frm (R0) should be incrementing * Int Count (R2) should be incrementing (except during ALT when it will stay at 96 while the DEP Int Count increments from 1 to 5.) - Confirm that the DEP registers are set correctly * DEP register 3 displays the VC mode (0 = No H/R, 2 = VC2, 3 = VC3) * DEP register 4 should will be 1 when a campaign is running. Cam Flag will be enabled (04h). * DEP register 5 (MagTelem) should be 1 during campaigns * Alt Enable should be enabled (02h). * Check the current instance to see what the Campaign Address (R22) should be. - Confirm that the IP is running * IP Status should be 48 * IP Error should be 3000 * Int Count should be incrementing once per minute. * IP-DEP Bank should toggle between 00h and 20h about once per minute. * IP Error Count is not increasing "rapidly", like every minute. There may be up to 1 F000 error per hour. If the error count is incrementing once every three minutes then the VC flag (DEP Register R3) may have been left nonzero when the H/R telemetry was switched away. This could represent an early command link drop. More rapid errors may indicate that the IP has crashed. * IP Error Codes of F000, FEnn and 8071/4 have been seen during the dynamics period and are not cause for immediate concern. Errors where the counter increments but none of the other fields change are probably multibit EDAC IP memory errors. All non FOOO errors are significant and should be logged. * IP Error Code is not 807* (Where * could be one of several numbers). This Error Code could indicate the queue is no longer enabled. Verify that IP Error is 3000 (queue enabled). The campaign run summary and IP error summary for the last 24 hours should be reviewed. They can be found on the web page or in: /mdisw/dbase/cal/info/last24hrs.cam_run_sum /mdisw/dbase/cal/info/last24hrs.iperr_sea More detail on these can be found in: /mdisw/doc/monitoring.info/dumb_terminal.monitor Once the verification is complete, the user should add a line to the operator's log file. This can be done from any /mdisw/bin equiped machine (fault, mdiems, mdicmd, mdisas, etc) using the add2log command or from the web page. To use the add2log command, preface your entry with your initials and terminate with a period or a ^d on a blank line. Example add2log INITIALS - comments . MDI FLIGHT STATUS DISPLAY EXPLANATION ------------------------------------- (Found by typing: screen 12) Typical Snapshot of MDI Flight Status Screen ############################################################################### GMT: Wed Jul 24 13:10:10 1996 TM: MEDIUM MDI Flight Status(12) TM_STATE: RUNNING HR MODE:COMPRESSED DPC:404e4400 SVM: 000374 HK1: 000374 HK2: 000375 SCI5K: 008966 SCI160K: 075537 POWER (A) ON 1.31 a TIMING IP m+5pc1n 5.36 v Ref Time 4888B768h IP Status 48h m+5cm 5.44 v LOBT 4888B782h Error Stat 3000h m+5pc2n 5.33 v DEP Int Count 5B9Bh m+5ae 5.28 v 5K Mode 03h IP-DEP Bank 20h Motor Cur 0.00 a 5k Status 02h OVEN ON SEQUENCE Error Count 18 qtopts3n 35.6 c Seq ID C688E0C5h Error Code F000h mtopts5 35.5 c Seq Status 05h Opcode 01E6h OP/EP TEMPS Seq Error 0003h DMA Mode AE1Ah qtopts2n (OP) 19.0 c Int Count 70 IP Que Add 0DC4h mtopts6 (CCD) -77.1 c FrmInSeq 10237 mtopts1 (FW) 40.0 c REGISTERS FLAGS qtepts1n(PC1) 15.1 c OBS Frm (R0) 5 Cam Flags(R16) 06h mtepts3 (PC2) 17.4 c Int Count(R2) 70 New Prime 00h ISS CLOSE VC ID (R3) 2 Alt Enable 02h PZT1 IN RANG 1.84 v Cam Mode (R4) 1 New Flag 00h PZT2 IN RANG 1.68 v Mag D/L (R5) 0 II Flag 00h PZT3 IN RANG 2.03 v Cam Addr(R22) 01AEh Cam Flag 04h Xavg:2.25 Yavg:2.27 v Chn Addr(R25) 01AEh Chain Flag 00h ############################################################################### Parameter Explanations ############################################################################### Banner Section ------------------------------------------------------------------------------- 1 GMT: Wed Jul 24 13:10:10 1996 TM: MEDIUM 2 MDI Flight Status(12) TM_STATE: RUNNING HR MODE:COMPRESSED DPC:404e4400 3 SVM: 000374 HK1: 000374 HK2: 000375 SCI5K: 008966 SCI160K: 075537 ------------------------------------------------------------------------------- LINE COLUMN 1 Column 2 Column 3 ------------------------------------------------------------------------------- 4 POWER (A) ON 1.31 a TIMING IP 5 m+5pc1n 5.36 v Ref Time 4888B768h IP Status 48h 6 m+5cm 5.44 v LOBT 4888B782h Error Stat 3000h 7 m+5pc2n 5.33 v DEP Int Count 5B9Bh 8 m+5ae 5.28 v 5K Mode 03h IP-DEP Bank 20h 9 Motor Cur 0.00 a 5k Status 02h 10 OVEN ON SEQUENCE Error Count 18 11 qtopts3n 35.6 c Seq ID C688E0C5h Error Code F000h 12 mtopts5 35.5 c Seq Status 05h Opcode 01E6h 13 OP/EP TEMPS Seq Error 0003h DMA Mode AE1Ah 14 qtopts2n (OP) 19.0 c Int Count 70 IP Que Add 0DC4h 15 mtopts6 (CCD) -77.1 c FrmInSeq 10237 16 mtopts1 (FW) 40.0 c REGISTERS FLAGS 17 qtepts1n(PC1) 15.1 c OBS Frm (R0) 5 Cam Flags(R16) 06h 18 mtepts3 (PC2) 17.4 c Int Count(R2) 70 New Prime 00h 19 ISS CLOSE VC ID (R3) 2 Alt Enable 02h 20 PZT1 IN RANG 1.84 v Cam Mode (R4) 1 New Flag 00h 21 PZT2 IN RANG 1.68 v Mag D/L (R5) 0 II Flag 00h 22 PZT3 IN RANG 2.03 v Cam Addr(R22) 01AEh Cam Flag 04h 23 Xavg:2.25 Yavg:2.27 v Chn Addr(R25) 01AEh Chain Flag 00h ############################################################################### Banner - Telemetry Status ------------------------- Line Label Expected Value Description 1 GMT: Current GMT GMT as derived from system time TM: MEDIUM Telemetry channel status. This will never read HIGH even if VC2/3 are active, but it could read LOW which would indicate that the S/C has had a serious problem. In LOW mode the EXP packet will take the place of the MDI HK1 and HK2 packets. Only the temperatures and voltages may be accurate for such a circumstance. 2 MDI Flight Status(12) Display name and number TM_STATE: RUNNING If the telemetry has been dropped by the ECS, but it is still running then you would see REQ ACKN. HR MODE: NO DATA sci160k data received at EOF (VC2) DPC: 42020fc0 8 digit hex data product code for VC2 high rate data. 3 SVM: count S/C SVM1 packet count for connection, will increment by 1 every 15 sec HK1: count MDI HK1 packet count for connection, will increment by 1 every 15 sec HK2: count MDI HK2 packet count for connection, will increment by 1 every 15 sec SCI5K: count MDI sci5k packet count for connection, will increment by approximately 24 every 15 sec SCI160K: count MDI sci160k packet count for connection, only increments during periods of VC2 contact (DEP Register R3 = 2) Column 1 - MDI Health Parameters -------------------------------- Line Label Expected Value Description 4 POWER ON S/C Bus A power status for MDI 1.29 a S/C Bus A Current as drawn by MDI. Should be between 1.0 and 1.6 amps. 5 m+5pc1n 5.36 v Power Converter 1 5 volt bus voltage monitored by S/C. Supplies the IP and camera. Should be between 5.1 and 5.6 volts. 6 m+5cm 5.48 v Power Converter 1 5 volt bus voltage monitored by MDI. Should be +/- 0.2 from m+5pc1n 7 m+5pc2n 5.33 v Power Converter 2 5 volt bus voltage monitored by S/C. Supplies the DEP, Application Electronics, mechanisms and heaters. Should be between 5.1 and 5.6 volts. 8 m+5ae 5.28 v Power Converter 2 5 volt bus voltage monitored by MDI. Should be +/- 0.2 from m+5pc2n 9 Motor Current 0 a Should not remain above 0.05 for more than a minute or two. 10 OVEN ON Prime Oven heater controller status. 11 qtopts3n 35.6 c Oven temperature as monitored by S/C. Should be within 0.2 degC of 35.6 degC 12 mtopts5 35.5 c Oven temperature as monitored by MDI. Should be within 0.2 degC of 35.5 degC. 13 OP/EP TEMPS 14 qtopts2n 19.0 c Optics Package temperature as monitored by S/C. Should be in range of 10 to 35 degC. 15 mtopts6 -80.0 c CCD strap temperature as monitored by MDI. Should be in range of -105 to -60 degC. 16 mtopts1 40.0 c Front window temperature as monitored by MDI. Should be in range of 10 to 45 degC. 17 qtepts1n(PC1) 15.1 c Power Converter 1 temperature in the Electronics Package as monitored by S/C. Should be in range of -5 to 40 degC. 18 mtepts3 (PC2) 17.4 c Power Converter 2 temperature in the Electronics Package as monitored by MDI. Should be in range of 0 to 40 degC. 19 ISS CLOSE Image Stabilization System (Limb Tracker) control loop status. Should be closed except during ALT sequence magnetograms, once every 96 minutes. 20 PZT1 IN RANG 1.9n v ISS PZT1 actuator status as monitored by S/C and MDI monitored operating voltage. Voltage should be between 1 and 3 volts. 21 PZT2 IN RANG 1.9n v ISS PZT2 actuator status as monitored by S/C and MDI monitored operating voltage. Voltage should be between 1 and 3 volts. 22 PZT3 IN RANG 1.9n v ISS PZT3 actuator status as monitored by S/C and MDI monitored operating voltage. Voltage should be between 1 and 3 volts. 23 Xavg 2.25 v ISS average error in X direction measured in volts. Should be between 2.1 to 2.5 volts. 23 Yavg 2.27 v ISS average error in Y direction measured in volts. Should be between 2.1 to 2.5 volts. Column 2 - MDI DEP/Sequence Parameters -------------------------------------- Line Label Expected Value Description 4 TIMING 5 Ref Time 482DE6D0h Reference time of last image, should be within 40h of the LOBT. 6 LOBT 482DE6C7h Current on-board time. It will cycle from about -30 to +30 w.r.t Ref Time in steps of 6-8 seconds. 7 DEP 8 5K Mode 03h set by the telecommand MBDPMOD 00h - no 5k science (engineering mode) 01h - set to initialize, then returns to 00h 03h - sequence/science mode 04h - special limb tracker mode 9 5k Status 02h, 95h, B2h Typically 95, 02 or B2 First column should not be 4-7, c-f Second column should not be 0,7-f 00 indicates no 5k data is flowing 10 SEQUENCE 11 Seq ID C40000B3h Will begin with C4 during dynamics or ALT. Will begin with either C6 or C7 during campaigns. 12 Seq Status 03h, 04h or 05h Sequence Status Codes (Normally see 03, 04 or 05) 00=SEQ_INACTIVE - stopped sequence (either due to a commanded halt or serious error) 01=SEQ_INIT - almost never seen (never last very long), DEP needs to be restarted 02=SEQ_WAIT_CAMERA - rarely seen, check if camera shutter is open or if data coming from camera (before any mechanism movement) 03=SEQ_ACTIVE - rarely seen, designates that the sequence is running 04=SEQ_WAIT_TIME - waiting for time mark (start of every minute), pictures taken based on this reference time 05=SEQ_WAIT_CON - waiting for configuration or the setup devices for next picture 13 Seq Error 0003h Sequence Error Codes 0001=SEQ_NORMAL-Benign condition almost never seen 0002=SEQ_LATE-Benign condition almost never seen 0003=SEQ_BREAK-Normal sequence operating condition *0004=CAMERA_ERROR-not responding to Take A Picture (TAP) command or the shutter is stuck *0005=SEQ_WAIT - ? *0006=SEQ_STACK_DEPTH_EXCEEDED - error in stacking and unstacking routines *0007=SEQ_ILLEGAL_INSTR - load corruption, syntax error *0008=SEQ_STOP_CMD - actually need to send a sequence stop command to get this value *0009=ACCESS_VIOLATION - illegal address (outside of defined address area) *=serious error conditions! 14 Int Count 43 Number of observing intervals elapsed since last reset (i.e. how long until next set of magnetograms begin) (current interval set to 1 minute, magnetogram set # = 5), counts up to 96. 15 FrmInSeq 22737 Number of camera frames taken in current sequence, increments 2 or 3 times per sample update (every 3 seconds during active sequence) 16 REGISTERS 17 OBS Frm (R0) N Frame Number, increments from 1 to 20, at three second intervals. Only updated at 7 second interval so it will appear to skip some values. 18 Int Count(R2) 43 Cycle number (see interval count, 0 - 96) 19 VC ID (R3) 3 Virtual Channel number 0=no HR 1=reserved for Dnyanesh, 2=HR - to GSFC/EOF 3=HR - not to EOF 20 Cam Mode (R4) 0 Campaign Flag, copy of R66 when right side active 21 Mag D/L (R5) 0 Downlink of magnetograms: 1=in HR 160k, immediate downlink 0=in LR 5k, stored until tape dump (R5 is ignored if R3 doesn't = 0) 22 Cam Addr(R22) 0000h Campaign Address, can be any of the onboard campaigns depending on daily plan. 0000 is cam_vc2_obs. 23 Chn Addr(R25) 0000h Chain Address for next Campaign when Chain flag is ON Column 3 - MDI IP/Sequence Parameters ------------------------------------- Line Label Expected Value Description 4 IP 5 IP Status 48h 8 bit hardware register 1st nibble = DEP & IP interface state 4 = DEP & IP interface is happy 0 = DEP & IP interface is not happy 2nd nibble 8 = IP in run mode 0 = IP not on should be 48h when IP is nominal 6 Error Stat 3000h 3000 = PRIME 160k, Queue enabled 2800 = REDUNDANT 160k, Queue enabled 1000 = PRIME 160k, Queue disabled 0800 = REDUNDANT 160k, Queue disabled 07ff = Other IP error bits (seldom seen, but would indicate potentially serious problem). 7 Int Count NNNNh Count of the number of interupts to the DEP. Updates approximately one per minute during dynamics. 8 IP-DEP Bank 0h or 20h Toggles every minute from 0 or 20 when sequence is running. 9 10 Error Count 0 Count of IP errors. If the count increments but the Error Code, Opcode, DMA Mode and IP Que Addr stay the same then it was probably an EDAC page multibit error. 11 Error Code 0000h IP Errors 8071 = memory not available on DMA bus A 8074 = no end of transfer on DMA bus A FOOO = TM board hang. Could be caused by a loss of high rate clock to the MDI TM board (S/C moved out of MDI high rate mode without setting VC flag (R3) to zero) or by a compression error. FEnn = Bad frame to page nn (hex) 12 Opcode NNNNh Opcode of IP command which initiated error condition 13 DMA Mode NNNNh Mode of the DMA during error. For a FOOO it records a count of the FOOO errors. 14 IP Que Add NNNNh address of Opcode in IP macro queue which caused error condition. 15 16 FLAGS 17 Cam Flags(R16)* 06h Sequence 18 New Prime 00h new prime sequence (00h or 01h) 19 Alt Enable 02h alt enable (00h or 02h) 20 New Flag 00h new campaign enable (00h or 08h) 21 II Flag 00h II cmapaign enable (00h or 10h) 22 Cam Flag 04h campaign enable [R66, R4] (00h or 04h) 23 Chain Flag 00h chain campaign enabled (00h or 20h) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ *Cam Flags (R16) Explanation This register is a collection of 6 other registers. Name Description Hex value if set (On screen 12) R64 New Prime Table Flag At the end of the 1 minute sequence 1 the DEP will go to a new frame list for the prime sequence. R65 Alt Enable The alt seq will run when the 2 Int Count hits the prescribed value. R66 Campaign Flag At the end of the 1 minute sequence 4 DEP will begin running a campaign and will continue until the flag is turned off. R68 New List Flag At the end of the 1 minute sequence 8 DEP will begin a new campaign. R70 II Flag DEP will accept a command from 10 (Intra-Instrument) another instrument to begin a campaign. (Current plan is to never use this) R72 Chain List Enable Flag At the end of the campaign a new 20 campaign will begin. R76 Chain Terminater Flag At the end of the campaign chain no new 40 campaign will begin. All of these flags have addressees associated with them. The address is the location of the campaign in the DEP that is to be performed when the flag gets enabled. Note: Campaigns will occasionally be moved around within the DEP to make room for new campaign, therefore the address of certain campaigns will change with time. The see the latest address of a campaign look on the LPARL Ops web page.