========================================================================== Pipeline configurations ========================================================================== o Requested processing configuration identified by obi_data keyword in obs.par o All pipelines process data "as far as possible" o Possible values for obi_data: - standard . Standard pipeline for processing engineering and non-SI data; used when SI L1 processing is not possible . L0: ephem, all telem . L0.5: asp, obi_det, sim . L1: asp_obc, ephem, ephin, obc_ephem, tel - asp_centroid . Special pipeline used for aspect camera calibration - centroids on images of FID lights instead of guide stars . As 'standard', except - - L1 asp (centroid only pipeline) - primary . Normal science operations pipeline configuration for processing focal plane SI data . As 'standard', plus - - L0.5: acis, hrc - L1: acis, asp(full pipeline), hrc - L1.5: tg - L2: cc_acis, hrc, te_acis, tg_te_acis - secondary . Secondary science operations pipeline configuration for processing non-focal plane SI data; no aspect solution or grating processing . As 'primary', except - - L1: excludes asp (full pipeline); SI pipelines do not apply aspect solution; alternate GTI limits - L1.5: no pipelines - L2: no grating pipelines; executes per-L1 ========================================================================== Processing Out of order dump evaluation 12/27 ========================================================================== The short answer is that dumps can't be fed out of order to AP. It would require a significant amount of redesign to accommodate it in telemetry and AP. The current code meets the requirements in SE03. IF, a prolonged period of time passes and a dump is missing, a workaround exists to start a new sdp directory and proceed with AP. This is not a routine procedure and is only used after team consult and leadership from the dutysci. ========================================================================== The data system was developed based on the requirements specified in SE03. The section specifying the requirements for data receipt indicates that the OCC will provide a single overlap-removed time-ordered data stream. Data that do not meet these criteria are to be temporarily stored by the data system until the corresponding time-ordered data are received. We have already made provisions in DataReceipt to handle overlapping data, since the "overlap-removed" requirement was never met by the OCC. The bookmarking code to handle overlapping data further complicates the idea of feeding out of order data to the system. The following is an evaluation with inputs from Dave Plummer and Gregg Germain as to the impact of running data out of order in AP. We've listed the current issues along with rough time estimates for changing the system to accomodate out of order dumps if it was to become a new requirement. -------------------------------------------------------------------------- from Dave: ---------- 1. DataReceipt - Uses persistence files (and internal data structures) to keep track of last VCDU seen so it can trim overlapping VCDUs from the beginning of each raw telemetry file. It also creates a data product recording the start/stop of an OBSID in telemetry (this file is used to kick off OBIDET and other OBI based pipelines). Several new scenarios have to be accounted for in the application, for example, how to handle a false OBSID transition when the actual OBSID transitioned during the missing data. To handle out of order data would require a major redesign of the application with significant new functionality including (but not limited to) the following: 1. Incorporating a state machine to handle the different behavior which depends on the following states: 1. contiguous data mode 2. skipping file mode 3. file skipped - no OBSID transition 4. file skipped - (possible) missing OBSID transition 5. file skipped - (possible) VCDU rollover 6. file skipped - (possible) VCDU reset 7. backfill mode (single raw file) 8. backfill mode (multiple files) 2. A signalling mechanism to allow operators to manually transition DataReceipt between the above states. 3. New data structures to handle the list of data gaps. 4. Ability to trim both front and back of raw data file in "backfill" mode. 5. More complex logic to handle VCDU rollovers and resets because of (potentially) missing data. 6. More complex logic to handle OBSID transitions when data is (potentially) missing. > A rough estimate for the time needed for requirements analysis, design, > development and testing of the new functionality would be about 6 > months of full-time effort for an experienced developer. 2. DR_FlowControl - De-couples DataReceipt from telemetry processing. Ensures that files that have gone through DataReceipt are fed in order to telemetry processing (pi2). Provides an entry point into the AP system for raw telemtry that has already been through DataReceipt. DR_FlowControl would need to recognize and handle the "contiguous data" and "backfill" modes described above. Estimate 1 month FTE. 3. AP infrastructure impact: 1. Telemetry processing will also need to handle the different states mentioned above (as well as others). So in addition to mechanisms within each application for transitioning to and handling the different states, the AP infrastructure will need a mechanism to coordinate these state transitions. Estimate 4 months FTE. 2. Startup of AP requires retrieving old dump files to feed to the telemetry strippers so they can initialize the buffers to the point they were when we came down. Would need logic to account for potential holes in the telemetry in this procedure. Estimate 1 month FTE. 3. Error recovery processing would also need similar modifications. Estimate 1 month FTE. 4. There will be no impact on pipeline processing, OST, OPUS, darch, or cache_server because they have no requirements for data to be processed in order. However, note that many pipelines will be blocked until the missing data that is skipped is processed. > Total (very rough) estimate to accomodate skipping raw files is > 13 months for a FTE. ---------------------------------------------------------------------- from Gregg: ----------- Trying to change the L0 software from a serial system to an asynchronous one is a complete violation of it's basic design precepts. In many cases it might be possible to modify the code to handle out-of-order dumps, but what you would get is a kludge to a system not designed to function that way. It would be different if it was designed to work out of order from the start. As I've been given only 2 days to come up with a re-design of the L0 system, the analysis below might not be complete - I might be missing some problems. But it's an initial cut. And the time estimate is really rough. > A very rough guess as to how long it would take to make all this > happen is 8 to 15 months, FTE. In general, processing dumps out of order results in strip files that either start ok but have the wrong last half, or have the wrong start but the proper last half. Examples of these cases are ACA-I, ACIS. As a rule of thumb, any instrument that has an atomic unit with any structure to it at all (EPHIN histograms or ACIS Science Runs), and where those atomic units can cross dump boundaries, will give you problems. The following analysis assumes, for the sake of examples, three 8 hour data dumps: | Dump 1 || Dump 2 || Dump 3 | But the order of processing is changed: | Dump 1 || Dump 3 || Dump 2 | If an atomic unit begins at the END of Dump #1, and finishes at the beginning of Dump #2, will result in an unfinished atomic unit with a good first part and a bad (or non-existent) second part. Any atomic unit that starts in Dump #2 and finishes in Dump #3 will have a bad (or missing) first part, and a good last part. What would it take to automate this in AP? Like all subsequent discussions of automation, the trick would be for the system to know what's happening. Since AP runs automatically, and the L0 software commands to start have long ago been given, there's little an operator could do to inform L0 that an out of order dump is occuring. It would have to be able to conclude for itself that it's seeing Dump #3 after Dump #1, and that it might later get Dump #2. In general this means it would have to determine it's missing an 8 hour dump (and not just suffering a gap), then make note of partial L0 products - both from Dumps #1 and #3. Then, when you fed it Dump #2 (out of order) it would have to conclude that this is indeed the missing dump and then figure out how to properly finish the population of the partial strip files it made note of earlier. AP would have to somehow know the difference between a partial strip file that may or may not be later finished off, and a strip file that's finished and ready for extraction. There's the problem of strip files that would be partial even if the dumps were processed in order. Somehow the system would have to recognize that. And this is if the system is expected to handle ONE dump out of order. If it's supposed to handle many dumps out of order, the change would easily take more than a year. Details below. There are cases (HRC, and EDE) where this is not a problem. The areas of possible problems, when procesing dumps out of order are: The strippers - organization of strip files - bookmarking PI2 -Clock Correlation/TCS ACIS Extractors: =========== Extractions will generally proceed ok - with the obvious exception of ACIS of course. About the only difference you will see in the results (other than ACIS) is that the data might be groups a little differently - in different file organizations - than you would get if you processed the dumps in order. You will also see partial entities in places which, had you processed the data sequentially, you would have had complete entities. Examples of this are ACIS science runs and ACA-I 6x6 and 8x8 images. Places where this occurs will be listed below. The Strippers: ============== The following strippers *might* work ok, without modification, if the dumps were processed out of order: HRC EDE There are some circumstances where they would not work ok (basically if you DON'T allow a timeout after processing Dump #1). The following strippers would have a problem if the data were processed out of order: ACIS ACA-I ACA-C SIM EPHIN ACIS: ===== ACIS is, of course, the most major problem. And it's huge. The ACIS stripper would have to be completely re-written and that would take at least 6 months FTE all by itself. The ACIS data is designed to be processed in order, so if a science run is missing a parameter block, or a science run report, it's treated as an incomplete run. ACIS does not close an open strip file when the system times out. This is because it MUST have the rest of the data to complete the strip file. Processing dumps out of order could result in at leat two science runs that have to be held up and populated later when the dump arrives. For example: | Dump 1 || Dump 2 || Dump 3 | [ science run 1 ] [ science run 2 ] Today, if you processed the dumps out of order, the strip files you would get would look like: Strip File #1: [ start if SR1, end of SR2 ] Strip File #2: [ end of SR #1 ] Strip file #3 [ start of SR 2 ] And when you processed Dump #4, if there was a gap which obliterated the parameter block of SR #3, the packets of SR#3 would be tacked on to strip file #3: [ start of SR 2, end of SR #3 ] These results are typical of the other strippers where there's structure to the atomic unit (EPHIN, Sim Diag, ACA-Image, ACA-Cal). To allow the ACIS stripper to process the data out of order AUTOMATICALLY, the stripper would have to be thrown out. A new one would have to be re-written that collected packets and put them in piles. Once a pile had all the requisite parts, a strip file could be generated. The stripper would have to know which packets went into which pile, detect when an 8 hour dump was missing and hold that pile until it gets the missing Dump (if it ever does). it will have to detect the difference between a strip file missing a parameter block due to a gap, and a strip file waiting for an out-of-order dump. Generally, piles would be built as strip files are now, since most do come in order. However, the ONLY way the new ACIS stripper would know that it's getting an out of order dump (Dump #2 when it eventually shows up) is by the VCDU/OBT times of the newly arriving dump. That will be the signal to trigger the splicing in of the data to the right strip files held in abeyance. HRC stripper: ============ The HRC stripper closes the strip files upon timeout. Therefore, when it's done processing Dump 1, the stripper will time out, and close the strip file. the HRC stripper does not hold anything in memory until the next data dump appears so there's no fracturing of atomic units. HOWEVER, if there's less than five minutes time delay between the completion of processing dump #1, and the start of processing Dump #3, the stripper will write the dump #3 events into the strip file that's already opened, and already has dump #1 data. In this case, there will be a HUGE bad quality map, and the times of the events in the FITS file will have an enormous jump. If that's a problem, the stripper would have to be re-written so that when it detects a missing dump, it closes the open strip file and starts a new one with the dump. If there's a format change in Dump 2 - where, say the HRC goes from being in the focal plane, to next-in-line, the stripper will catch that so there's no problem there. I don't see any major problem with HRC stripping. Engineering Stripper: ===================== As in HRC, when the system times out at the end of data dump #1, the strip file is closed. So there's no prblem with processing the dump out of order if a timeout occurs. Also, like HRC, if a timeout is NOT allowed to occur, then there will be a mix of dump #1 and Dump #3 data. In this case, there will be a HUGE bad quality map, and the times of the events in the FITS file will have an enormous jump. If that's a problem, the stripper would have to be re-written so that when it detects a missing dump, it closes the open strip file and starts a new one with the dump. New templates are loaded as it detects format changes. ACA-Image: ========== As in HRC, when the system times out at the end of data dump #1, the strip file is closed. However, there would be substantial problems if Dump #3 were processed after Dump #1: The ACA-Image stripper builds 8 strip files - one for each ACA-I slot. These can contain 4x4, 6x6 or 8x8 sized images. When ACA-I is building a strip file, it collects data to form a complete 4x4, 6x6, or 8x8 image before *writing* that image out to the strip file. It's possible that a 6x6 or 8x8 image may straddle dumps. So part of these images will be in Dump #1 and the rest in Dump #2. If this is the case, when ACA-I times out after Dump #1, it holds part of the 6x6 or 8x8 image in memory. It expects to see Dump #2 and the remainder of the 6x6 or 8x8. If you feed it Dump #3, instead, it will assume there's a huge gap and handle the situation as best as it can. You will get an incomplete 6x6 or 8x8. You might also get incomplete images from the start of Dump #3 because the beginning of those images are at the end of Dump #2. Then later, when you send in Dump #2, there's no way for the stripper to know that it needs to tack that data onto an old strip file - neither the Dump #1 incopmpletes, nor the Dump #3 incompletes. I'm not sure how one would fix this. One possibility is that if the gap is approximately 8 hours, the code might log the names of the incomplete strip files and try to remember those names if and when it sees Dump #2, and then try to tack on the appropriate data. This possibility, if it could be made to work, would take at least a month to design and implement and perfect against all contingencies. ACA-Cal: ======== As in HRC, when the system times out at the end of data dump #1, the strip file is closed. However, it holds frames of up to 1040 bytes in memory until the frame is compelte. As there's no connection between the minor frame structure and the placement of these frames, no vcdu counters and no OBT times, the present frame held in memory will probably be corrupted, and gap detection is impossible. The ACA-Cal stripper looks for sync bytes in the frame so it will most likely sync up again. Sim: ==== The only thing in Sim that may present a conflict would be the TLMSTATUS data. The data stored in the user data section of the strip file. The value from the previous strip file is taken from the previous strip file data, and placed in the present new strip file. The value toggles from zero to one. When you skip Dump #2, the last value of TLMSTATUS you had for Dump #1 is put in the strip file for Dump #3. This may not be the correct value: the stored value may be zero coming out of dump #1, but would be a 1 coming out of dump #2. If this turned out to be the case, the only way out might be to reprocess with the simreformat tool. Also, Sim Diag atomic units are 32 major frames, therefore processing the dumps out of order would give you ACIS-like results. EPHIN: ======= Ephin carries over data in memory that can be processed with data from a new dump. The data would be written out in a new strip file. It does close the open strip file when pi2 goes comatose. Atomic units of EHPIN are 16 major frames. Therefore it's possible to combine data from Dump #1 with Dump #3 resulting in an error. The system would have to recognize that Dump #2 might be missing and hold the strip files in abeyance. Then, when it gets dump #2, backfil the relevant strip files. PI2: ==== We'd have to re-evaluate how PI2 determines what a reset and rollover is. Right now PI2 will see the out of place dump and note that the vcdu is less than the current vcdu. It will mark it as a reset and try to handle it that way. It will then either find a fencepost that works for the reset or not. If perchance there was an actual reset/rollover in the latest (chronologically) dump then PI2 will look at the dump to be added as a large gap. In which case perhaps a new algorithm may need to be added for large gaps. In the scenario that we have re-written these algorithms, the fencepost update would also need a new algorithm as when it looks for a new fencepost, it looks for one AFTER the last one it used (for dump #3). If you then processed Dump #2, it might pick a fencepost that's not nearly as good as the one it would pick if you processed them in order. The gap files would also need to be re-adjusted as they would not correctly indicate the gaps between dumps. Bookmarks: ========== Presently, the sytem writes a bookmark whenever it writes an atomic unit out to a strip file. If the system should crash, you restart the system with data BEFORE the bookmark. The system will ignore the data until it sees the first minor frame after the bookmark (different in ACIS case - there it has to bookmark positions within a minor frame). Presently the system would fail if you fed it Dump 1, then 3 then 2. After processing #3, the bookmark would be set near the end of #3. When you then fed it #2, all that data would appear BEFORE the bookmark, in time, and therefore ignore it. One quick way around this is operator-based: when you know you are feeding dumps out of order, simply delete the bookmark files. This guarantees that the data will be processed. So, for example, after Dump #1 and then Dump #3 is processed, you then wipe out the bookmark files, and process Dump #2. The data, in Dump #2, will not be ignored. But your products may be corrupted. ------------------------------------------------------------------------------------ Addendum: Each stripper would have to be modified in the stripper itself. Because each atomic unit is different, it doesn't look likely that service code could be created once to handle holding strip files in abeyance. Only the code that recognizes that a dump may possibly be missing (and recognize when the missing dump appears) could be service code.