Tagged Documents

Note: The original PDF should be used for entry and verification.

Parsers used to load the questionnaire into Archivist: parse_ncds_capi_txt and archivist_insert.

This method involves colour-coding the questionnaire elements and adding additional information about response domains, conditions, and loops in Word, and then labelling the colour-coded elements in Excel. Microsoft Word often becomes slow and glitchy when handling large documents, so save each section of the questionnaire in separate Word documents and combine them at the end once each section has been verified.

Copy and paste the questionnaire from the PDF into Word.
Remove “junk” from the questionnaire (e.g., help screens, derived variables, page numbers) and any line breaks or formatting errors that may have been introduced in the copy-and-paste process.
Colour code the elements of the questionnaire (e.g., change the font colour of all the code lists to yellow).
1. Brown = question label
2. Green = question literal
3. Red = instruction
4. Yellow = code list
5. Orange = response domain
6. Black = statement label
7. Pink = statement
8. Light blue = sequence
9. Dark blue = condition label
10. Purple = condition text
11. Grey = loop
  1. Whilst doing the colour-coding, move the question labels to a separate line before the question literal, and add labels for the other constructs also on a separate line before the construct (e.g., add condition labels to the line above the condition text, add statement labels to the line above the statement).
Tidy up the questionnaire (i.e., group all the instructions together; check for repeated labels and re-label accordingly).
Add information about the response domains. Format: Label,Type,Type 2,Format,Min,Max. See the readme on GitLab for formatting examples.
Add information about loops. Format: {label, _variable, start value (always “1”), end value (when provided)}. Note that loop whiles are added in Archivist, not at the tagging stage. See the readme on GitLab for formatting examples.
Add the CLOSER condition logic in curly brackets, and a condition label on a separate line above the condition logic. See the readme on GitLab for formatting examples.
Copy and paste the questionnaire into Excel:
1. Add a new column of numbers in ascending order.
2. Filter the text by colour so it is grouped together by colour.
3. Add a new column for the element tags.
4. Tag the elements (e.g., add "question label" to the tags column for all the brown text, add "question literal to the tags column for all the green text, etc.).
5. Sort the numbered column ascending and apply it to all the text so that it is back in the order of the questionnaire.
Copy and paste the questionnaire back into Word using the text-only option. Save the document as a .txt file.
Send Becky the .txt file to parse the questionnaire into Archivist.
1. Open the parse_ncds_capi_txt GitLab page and upload the tagged questionnaire document to the tagged_input folder, deleting the existing file once the new one has been uploaded.
2. Run the GitLab pipeline, which will generate an XML file.
3. Run the XML file through archivist_insert.
4. Download the XML and import it into staging or build Archivist, ensuring that the questionnaire name and agency are correct.
5. Do a quick check to make sure it looks reasonable (see Checking parsed questionnaires before editing), and then load into the correct instance of Archivist.
Nest the questionnaire in Archivist as it is parsed in flat, add in the loop whiles (if there are any), and check for errors before it is passed on for verification.