HTML (Next Steps)

Input: Several HTML files e.g. S:\IOESRI_MetaData\CLOSER_Metadata\bundles\ns_05_w2\input\wave2-html

Pre-processing: None

Parser: Parsed using a script which has not been added to CLOSER Gitlab pipeline. 

Verification: The PDF should be used for entry and verification (different documentation was used for wave 8, see here).

General.

  • The questionnaires document the different modes of delivery of the survey (web, telephone, and face-to-face). We document the web mode, so information related to the other modes is not input into Archivist.

  • The order of the questionnaire in the PDF is not the same as the order that is input into Archivist - the questionnaire indicates the order of sections and questions in the actual interview, and indicates that sections and questions have sometimes been included in the documentation in a different order to the actual interview. Input into Archivist follows the order of the interview, not the order of the documentation.

Questions.

  • The parser doesn't always pick up all of the questions in the questionnaire. Where questions in PDF haven’t been parsed into Archivist, check that they should be included and add them.
  • Instruction text that runs on from the question literal but has the same wording and capitialisation as instruction text that is on a new line is input as an instruction, not as part of the question literal.

  • Questions in boxes in the PDF are typically looped or repeated questions.

Responses.

  • The PDF has different types of text answers: Open answer with defined length, Open answer without defined length, and Open type: long verbatim answer. Use Generic text for Open answers without a defined length, use Long text for Open type: long verbatim answers, and when the length is specified, the type of text answers is chosen accordingly (e.g., OPEN: 500 is input as Long text, as it is over the max character length of 255).
  • Some text answers are missing from the PDF. Where there is no text answer, use Generic text.
  • Some code lists don’t contain unique code values, such as the code list for question HWPChl in Wave 1: 1. Every time 2. Some times 3. Occasionally 4. Never 4. Depends what it is (Spontaneous code only). Change the code list values so that they are unique: 1. Every time 2. Some times 3. Occasionally 4. Never 5. Depends what it is (Spontaneous code only).
  • In most cases, the code list values for "Don’t know" and "Refused" are missing. -1 is used for "Don’t know", and -92 is used for "Refused".
  • In code lists, the category "None of these" sometimes doesn’t have a code value. Use the next available number in the code list as the value.
  • All code lists are parsed with 1:1 cardinality. Check and change where appropriate.
  • Check and change the order of compound answers (e.g., a numeric answer and a code list) so that it matches the order of the PDF. 

Conditions.

  • Check for missing conditions and add where needed.
  • Conditions in the PDF rarely have logic indicated in the condition text - look for potential logic using questions in the questionnaire and to add it to the CLOSER condition logic.

  • Conditions sometimes have incorrect question label capitalisation in the text or logic. Do not correct the condition text, but do correct condition labels and CLOSER logic to use the correct capitalisation as shown in the question labels in the questionnaire.