Report on MTA databases

Goals:

The Memos is prepared March 3, 2001, as a summary of the results of testing
the "3day" database. In this memo I summarize our analysis process, present
the results of the evaluation and make specific recommendations for moving
forward. An executive summary of this report is:
The data bases look fine except for the known timing problems. Some
databases are missing. There were some edge effects in the databases which
are not considered significant. The dataseeker is a great interface, but
needs thorough testing. Changes/enhancements are requested to the timing
scheme and data seeker. As an appendix the outline of additional database
has been attached.

Process

We employed a multi-pronged approach to the analysis:

   * We dumped each database directly to ascertain content and completeness
   * We used the dataseeker to dump each table into a fits file and compare
     to the input MTA_STATIC files.
   * We used the trends tool graphically compare the contents of the
     database to the input data.
   * A fourth approach, compared the databases to the raw spacecraft, data
     had to be abandoned as the spacecraft units and usually not acceptable
     its units. We also compared databases against those describe in spec.
     by Alesha this living document is a direct descendant of the original
     specification. The mapping of MTA data products to databases is most
     clearly made here.
   * What We did not do was to check that each element was in the requested
     database. In other words we have not gone through with a check list
     that the database match one to one with the input files on an msid by
     msid basis. That process will be carried out this week (March 3-10 2001
     - dependent on the weather this week)

Evaluation

Database data from dataseeker was compared with re-run daily report data.
Timing is such that at time n, average is given for the data points within
the next n+300s. Data gap in all tables: 88936800 - 88938300s, other gaps in
individual tables appear below. Average and standard deviation and limits
have been spot checked for all tables/columns - no problems.

Permission/system problems seem prevalent. SQL calls which work at on time
return permission errors at others. Part of the problem is that the
databases seem to contain many obsolete tables.

   * The DATABASES


     Database mtasim
       Table _simactu_avg
         *** - oddities
           faedge, fatabwid, mrmdest, tscedge, tsctabwid
             original data resolution ~1050 s.
             _lim alternates 0 to 1
       Table _simtemp_avg - no problems
       Table _simelec_avg - no problems

     Database mtahrc
       Table _hrctemp_avg - no problems
       Table _hrcelec_avg - 0 values in first row
       Table _hrcveto_avg - no problems
       Table _hrcchk_avg - no problems

     Database mtasubsys
       Table _subbatt_avg NO DATA  (superseded?)
       Table _subelec_avg NO DATA   (superseded?)
       subeleca_avg         permission denied   (superseded?)
       subelecb_avg         permission denied   (superseded?)
       batt_temp_avg       - no problems
       sc_anc_temp_avg     - no problems
       sc_main_temp_avg    - no problems
       spcelecb_avg        - no problems
       spceleca_avg        - no problems
       epsbatt_avg         - no problems
       spcelec_avg         - no problems

     Database mtaephin
      ephkey_avg  - no problems
      ephrate_avg - no problems
      ephtv_avg   - no problems
      ephhk_avg   - no problems

     Database mtaacis
      aciseleca_avg - no problems
      aciselecb_avg - no problems
      acismech_avg  - no problems
      acistemp_avg  - no problems

     Database mtatel
      gratgen_avg           - no problems
      hrmagrad_avg          - NO DATA
      hrmaheaters_avg       - NO DATA
      hrmastruts_avg        - no problems
      hrmatherm_avg         - no problems
      obagrad_avg           - NO DATA
      obaheaters_avg        - NO DATA
      obfwdbulkhead_avg     - no problems
      precoll_avg           - no problems

     Database mtapcad
      pcaddrift_avg           appeared redundant w/pcadgdrift but missing Navg
      pcadgdrift_avg          - no problems
      pcadftsgrad_avg         - no problems
      pcadtemp_avg            - no problems
      pcadrwrate_avg          - no problems
      pcadgrate_avg           - no problems
      pcaditv_avg             - no problems

   * CONFIGURATION DATABASES
     Database data was compared with archived eng0 files. Timing is such
     that at time n, configuration at time n+300s is shown. These database
     also contained mismatched data from March 2000, these were ignored.

     Database mtaconfig
       Table ccdm_config           - no problems
       Table hrc_config
          - occasional NULL rows like @ 89039700
          - also 2ohiradf and 2oloradf have NULL sometimes
          - other transitions check
       Table tel_config           - no problems
       Table mta_sim_config       - no problems
       Table mta_pcad_config      - no problems
       Table mta_acis_config      - no problems


   * EPHEMERIS DATABASES
        o mtacriteria not found
        o there is a table mta_sc_criteria under mtaconfig DB, but it's not
          5-minute averaged.

   * THE DATASEEKER
     The Dataseeker received heavy usage during this period. if proved the
     validity of design and seemed easy to improve and debug on the fly. The
     primary search criteria were not interfaced with the ephemeris data in
     mta_sc_criteria. Queries on primary criteria other than time were slow.
     This search is a perl search of an RDB table. Both PERL and RDB. Can
     cause the slowness. Many other smaller defects have been noted in
     Dataseeker. These are too numerous and miniscule to note here and are
     being dealt with directly by the programmer. Here I list some
     particular issues many were addressed in near real-time, kudos to Ryan
     for that,
        o Could not demonstrate Primary search criteria other than time
          (bug)
        o Could not demonstrate Secondary search criteria (bug)
        o Dataseeker was missing certain tertiary tables.
        o fits headers should capture query -enhancement & done
        o Several tertiary enhancements done

   * TRENDS tool did not receive heavy usage. This is primarily because it
     is designed to look at long time slices of data. However it did
     function correctly with the data bases and demonstrated its value.
     COMMENTS:
       1. TRENDS TOOL will need a top page at some point as described on
          page 7 of the monitor spec. I believe this exists and a minor bug
          prevented correct function
       2. TRENDS TOOL will also require middle level pages (i.e. the intent
          is that the links referred to above link to page like
          file:/data/mta1/TRD_TEST/report/20001026/aciselecb/html/aciselecb.html
          which represent the entire (week, month, year, mission). Again I
          believe this exists, but was disabled via a minor bug.
       3. -the database has renamed all out msids as _oldmsid_avg We need to
          translate this back to oldmsid so we can pick up current limits
          and the description. is this best handled by the code stripping of
          the extraneous characters or a look up table? This table can be
          produced by Science, Takashi is working on a smaller scale version
          for the comps tool (comma delimited).

Recommendations

These databases stand to be a tremendous boon to investigation of spacecraft
performance and may well be able to serve in science investigations as well.
There are a few issues that need to be addressed though.

   * Enhancement request:
     Timing: The timing, as stands, is inconsistent and inappropriate. This
     is as much an error of oversight than anything else. Change the 300
     second time step requirement to 328 seconds (exactly 10MF). more
     properly the timing of the bins should be a parameter. Changing the
     parameter may cause re-populating the databases from Time 0, but more
     likely only be a forward moving effect in the database.
     Presumed impact: At this point is is only an impact on 2 existing DBs.
     The config and the SIB. The timing needs to be adjusted in the config
     databases. The other 16DBs can be dealt with in the new MTA code.
   * CONFIGURATION DATABASES:
     The configuration at any given database time, should be the spacecraft
     configuration at that time (currently the spacecraft configuration is
     given at the endtime of the window. This is confusing to the user.

     > example (assume scenario 1 above...I prefer it).  We could TS for
     > this database at 328 if parametriaztion is too much of a hit.
     >
     > time   config     DB_time  DB_config
     > 0         A          0         A
     > 100       A
     > 200       A
     > TS        A
     > 400       B
     > 500       B
     > 2TS       B         656          B
     > 700       C
     > 800       D
     > 3TS       D         984          D
     > 1000      A
     > 1100      B
     > 4TS       A         1212         A
     > 1400      A
     >
       Note I there is no need for a record at time 328 as it is still
     config A

       Note II- Determining a new configuration with bad quality.
     This note is changed from email versions based on input received
     from  MCD. Bad quality data DOES represent a new state.  But The
     unknown datum should be stored as a Tnull (or NAN as appropriate)

      consider 3 configurations a,b & c   100011 a
                                          100X11 b
       and                                100X01 c
      Where "X" represents bad quality data.  They should represent 3
      separate DB entries.

   * Gaps: Properly the database has no knowledge of gaps. Data base
     correctly leave X*328 second gaps. Since table map to files. There
     should be no partial rows only non-existence rows. If the tables do
     merge data from different input files, the database table need a NaN or
     TNULL as appropriate, in the place of the nonexistent data in the row.
     This is a DB responsibility.
     Merging tables is also an issue. The data time must be held to
     religiously. gaps in one table which correspond to data in a merging
     partner need to be filled with NaN or TNULL as appropriate. The
     responsibility here is on the DATASEEKER.

   * Next Steps
     I declare this test successful, there were no show stopping surprises
     and no errors in segments of the code not already scheduled for revamp.
     I suggest we proceed as follows: (All steps and dates negotiable).
        o Begin population of current databases starting with Jan 1, 2001.
          Original population for entire month of January.
        o Hand off ingestion procedures to CXCDSops. CXCDSops work to ingest
          February and March data by March 31. April data by April 30. and
          so on until parallel development complete. I understand that this
          is a lot of labor, even if it is once a month. But the development
          track to automation seems to be about 3 months minimum. Meaning
          July before meaningful databases were available. This way
          meaningful databases could exist as within 2 weeks and continue to
          be meaningful until the final databases are ready.
        o In parallel, Science will continue testing on larger data base,
          this test will involve all DOSS. Report due by April 14.
        o Also in parallel SCIOPS should work on new averaging pipeline.
          INTEG should work ingestion into OPUS.

LONG TERM-NEW DATA BASES:

There have always been a second set of tables in the offing. It looks like
development could start soon. Here is a level B specification. I expect 2 or
3 rounds of back and forth between myself and the programmers until these
are finalized. Specifically I expect a revision of these in the April 15
report.

NEW TABLES:
Executive listing of new DBTs in priority order

  1. - Level 2 sources*
  2. Primary data
  3. Gratings (3)
  4. ACIS Background (should move up to 2 when the SIB is done)
  5. HRC BACKGROUND (should move up to 3 when the SIB is done)
  6. EPHIN L1
  7. ACA

*An RDB form of this exists and should be brought into dataseeker as soon as
possible.
The priorities may not reflect the state of code. For example the \ relevant
ACA codes are done, so table creation may be strait forward. There is no new
code requirement for the Level 2 database table so this is expected to be
relatively easy.
details:

PRIMARY DATA

These data are currently provided by SCIENCE to dataseeker.
As such they exist in RDB format, this slows the operation of the
dataseeker, SI-mode is also not captured. Primary data are essentially
the observational criteria of each observation and so could probably
be gleaned from the obspar, so long as we can insure that these
are "as flown."

Primary data include:
Tstart, tstop, RA, DEC,OBSID, SI, Grating, SI-mode

timestart = last TS time before the first photon
timestart = first TS time after the last photon

EPHIN L1

the binning time on this can be 4 ST (1312 seconds instead of 328 seconds)
These are level 1 data Not L0 as in the other database.

ARCHIVE SPACE ESTIMATE 1 record of ~20 COLUMNS/20 MINUTES

This could supersede the other one (EPKEY) though

 SCP4GM          Hz      1E   sci coincidence counter P4GM rate
 SCP8GM          Hz      1E   sci coincidence counter P8GM rate
 SCH4GM          Hz      1E   sci coincidence counter H4GM rate
 SCH8GM          Hz      1E   sci coincidence counter H8GM rate
 SCCT0           Hz      1E   sci coincidence counter CT0 rate
 SCE150          Hz      1E   sci coincidence counter E150 rate
 SCE300          Hz      1E   sci coincidence counter E300 rate
 SCE1300         Hz      1E   sci coincidence counter E1300 rate
 SCE3000         Hz      1E   sci coincidence counter E3000 rate
 SCINT           Hz      1E   sci coincidence counter INT rate
 SCP25GM         Hz      1E   sci coincidence counter P25GM rate
 SCP41GM         Hz      1E   sci coincidence counter P41GM rate
 SCH25GM         Hz      1E   sci coincidence counter H25GM rate
 SCH41GM         Hz      1E   sci coincidence counter H41GM rate

ACA
This is the input to ACA trend. and was in ACA_TRD spec.

The following are the appropriate entries for the database table.

FROM FIDROPS:
time slot_ID, ID_num,id_status, OBS_ID,mag_i_avg,
mag_i_min, mag_i_max for each time interval in FIDPROPS.

FROM GSPROPS:
time slot_ID, agasc_id ,id_status, OBS_ID, mag_aca_avg,
mag_aca_min, mag_aca_max, for each time interval in GSPROPS.

FROM ACACENT

time angynea(1-8), angznea(1-8) this is a total of 16
elements for  each time interval in ACACENT.  This is a low priority data base
as it is
quite large

 Other notes for the data base: id\_status (0= good,5=marginal 9=bad)
time = (tstop+tstart)/2, dt = tstop -tstart if there is only one entry
and no time is specified (I can't tell from the ICD if that is the case).

ACIS Background
ALL SIB output
HRC BACKGROUND
ALL SIB output

ARCHIVE SPACE ESTIMATE 1 record of ~20 COLUMNS/1 SECOND OF OBSERVING TIME
(WE MAY NEED TO USE a COARSE RA/DEC HERE, THE ABOVE ASSUMES 1' resolution,
We lower are archive usage by a factor of 16 if we go to 4' resolution
i.e. per node.  I think this latter choice is the way to go.


Level 1.5
Gratings
These require an improvement to the current grate_mon code
being run by AP.  namely, it should only be run on L1.5 files and needs to
use the source location from the file to make it's guess on the location of
zero
order. SOT has been manually running these to date.
So the source data exists from which to populate the databases.
Note, timing should be to the nearest 300 interval, just like the discretes.

3 data bases are required:
general grating data  L1.5 gratings data and line data.

ARCHIVE SPACE ESTIMATE 1 record of ~75 COLUMNS/ gratings observation
(~200/year)
ARCHIVE SPACE ESTIMATE 1 record of ~10 COLUMNS/ gratings observation
(~200/year)
ARCHIVE SPACE ESTIMATE 10 records of ~9 COLUMNS/ gratings observation
(~200/year)

general grating data
this should be a copy of the output parameter file produced by grat_mon:
1 file maps to 1 record.
Here is a sample output file.
1 record per observation.

 #
 # rdb parameter file ("rpf") format
 #
 S     rpf_creation_time : "Mon Nov 20 09:39:02 2000"
 S              filename : "acisf00644_000N001_evt1a.fits"
 S              detector : "ACIS-S"
 S               grating : "HETG"
 N           file_events :  1910749.0 +/- 0.0000000
 N        abs_start_time :  90628038. +/- 0.0000000 second
 N     interval_duration :  57401.546 +/- 0.0000000 second
 N        ave_event_rate :  33.287414 +/- 0.024081206 event/second
 S           proc_method : "Sky"
 N        expnos_ratio_0 :  0.0000000 +/- 0.0000000
 N        expnos_ratio_1 :  0.0000000 +/- 0.0000000
 N        expnos_ratio_2 :  0.0000000 +/- 0.0000000
 N        expnos_ratio_3 :  0.0000000 +/- 0.0000000
 N        expnos_ratio_4 :  0.93644834 +/- 0.0000000
 N        expnos_ratio_5 :  1.0046040 +/- 0.0000000
 N        expnos_ratio_6 :  0.97997612 +/- 0.0000000
 N        expnos_ratio_7 :  0.99704790 +/- 0.0000000
 N        expnos_ratio_8 :  0.68658620 +/- 0.0000000
 N        expnos_ratio_9 :  0.97340453 +/- 0.0000000
 N         live_interval :  57401.546 +/- 0.0000000 second
 N       live_event_rate :  4.6695607 +/- 0.0090193698 event/second
 N         zo_det_events :  10873.000 +/- 0.0000000
 N          zo_live_rate :  0.18941999 +/- 0.0018165657 event/second
 S          x_coord_name : "SkyX"
 S          y_coord_name : "-SkyY"
 N              zo_loc_x :  4116.8389 +/- 0.0000000 pixel
 N              zo_loc_y :  -4105.1914 +/- 0.0000000 pixel
 N       zo_det_fwtenthm :  4.0000000 +/- 0.0000000 pixel
 N           zo_det_fwhm :  2.0000000 +/- 0.0000000 pixel
 N             roll_axay :  -78.270645 +/- 0.0000000 degrees
 N             zo_chip_x :  207.03865 +/- 0.0000000 pixel
 N             zo_chip_y :  373.82413 +/- 0.0000000 pixel
 N        det_rms_radius :  13.889139 +/- 0.0000000 pixel
 N            zo_loc_fit :  2.3640544 +/- 0.0085596079 pixel
 N           zo_fwhm_fit :  40.662022 +/- 0.38683730 micron
 N                zo_acf :  0.034153013 +/- 0.00087427217 fraction in pixel/10
 N      zo_detail_events :  7841.0000 +/- 0.0000000
 N        zo_detail_rate :  0.13659911 +/- 0.0015426313 event/second
 N          strk_loc_fit :  26.163422 +/- 6.0589576 pixel
 N         strk_fwhm_fit :  408.91910 +/- 293.44986 micron
 N              strk_acf :  0.00097127842 +/- 2.6852255e-05 fraction in pixel/5
 N           strk_events :  6009.0000 +/- 0.0000000
 N             strk_rate :  0.10468359 +/- 0.0013504469 event/second
 N             zo_loc_ax :  4119.2031 +/- 0.0000000 pixel
 N             zo_loc_ay :  -4105.1914 +/- 0.0000000 pixel
 N      heg_minus_events :  3499.0000 +/- 0.0000000
 N       heg_minus_angle :  -5.2510018 +/- 0.0000000 degree
 N       meg_plus_events :  5063.0000 +/- 0.0000000
 N        meg_plus_angle :  4.7232814 +/- 0.0000000 degree
 N      meg_minus_events :  6793.0000 +/- 0.0000000
 N       meg_minus_angle :  4.7415104 +/- 0.0000000 degree
 N       heg_plus_events :  3033.0000 +/- 0.0000000
 N        heg_plus_angle :  -5.1984057 +/- 0.0000000 degree
 N        heg_all_events :  6532.0000 +/- 0.0000000
 N         heg_all_angle :  -5.2342334 +/- 0.0000000 degree
 N        meg_all_events :  11856.000 +/- 0.0000000
 N         meg_all_angle :  4.7173090 +/- 0.0000000 degree
 N         MEGm1_Res1keV :  247.98160 +/- 0.0000000
 N         MEGp1_Res1keV :  0.0000000 +/- 0.0000000
 N         HEGm1_Res1keV :  0.0000000 +/- 0.0000000
 N         HEGp1_Res1keV :  0.0000000 +/- 0.0000000
 N     L1a_MEGm1_Res1keV :  0.0000000 +/- 0.0000000
 N     L1a_MEGp1_Res1keV :  0.0000000 +/- 0.0000000
 N     L1a_HEGm1_Res1keV :  0.0000000 +/- 0.0000000
 N     L1a_HEGp1_Res1keV :  0.0000000 +/- 0.0000000

General observation (II)

timestart = last TS time before the first photon
timestart = first TS time after the last photon

>From the L1.5 file
1 record/ spectrum observed.
TIMESTART TIMESTOP DET GRAT Shape X Y R ROTANG COMP SOURCE

Line data
Take directly from L2
1 record for each line

TIMESTART TIMESTOP DET GRAT INDEX   POS.  FWHM   AMP.   CONT.  CONT.SLOP.

Level 2 sources - We have a prototype available, the data seeker could
be hooked into this immediately.  How we want to handle time is a
question for L1.5 and L2 data products.  I think each one should come
with Tstart and Tstop which are the DB times preceding and following
the Tstart and Tstop in the datafile.

for each source
(still need Hardness numbers The tool which would create this
is not in the pipeline.
)
again take directly from L2 data
ARCHIVE SPACE ESTIMATE 20 records of ~15 COLUMNS/ observation (~1000/year)

timestart  Timestop RA DEC X Y NET_COUNTS BKG_COUNTS NET_RATE BKG_RATE SNR THETA EE50_OBS
PSF50_OBS EXT