Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Datasets are created by the study, so CLOSER doesn't handle any of the data. The study can choose how to organise their datasets to be consistent with their systems and user access.

Therefore there is a many-many relationship between the dataset and the questionnaire. There can be several datasets per questionnaire, or there could be several questionnaires per dataset. 

If the study is funded by the ESRC, is is a requirement that the data are deposited in the UKDS, and so the dataset metadata provided by these studies usually matches what is available at the UKDS, and the DOI for this is provided.

For studies that manage their own data access or through other platforms like DPUK, the study should provide dataset metadata which users can request and a stable URL is provided.

Since CLOSER Discovery only provides metadata, it is usually acceptable for the datasets to contain variables which are under controlled access. 

Datasets are contained within the sweep/wave in CLOSER Discovery see sweep page for more information. 


DDI is a flexible standard and different users are at will to implement the standard in slightly different ways. This is a strength, but when moving between different implementations some adjustments need to be made. Data files should meet certain criteria, this generates a standard SledgeHammer output, which is then lightly edited to provide a consistent structure for ingest into Colectica Repository.

The following provides a detailed overview of the processes involved. Please see the step by step guide for the simple steps.

Variable metadata workflow

Image RemovedImage Added

Data Files

Studies should be encouraged to generate data files that are of the standards the UK Data Service

...

SPSS will hold lots of hidden information, Sledgehammer wil will try to use this and can lead to issues when outputting the DDI-L XML. We would recommend using something like this to get rid of this extraneous information This replaces a file label (often the location of the original file) with the bundle name, and to drop any document(s):

get file="G:\DB\closer_data\bcs70\bcs_1975\bcs_1975_masc.sav".
FILE LABEL "bcs_75_msc".
DROP DOCUMENT.
EXECUTE.
sysfile info file="G:\DB\closer_data\bcs70\bcs_1975\bcs_1975_masc.sav".
save outfile="G:\DB\closer_data\bcs70\bcs_1975\bcs_1975_masc.sav".

Use of SledgeHammer

SledgeHammer is a product released by Metadata Technology ( North America (MTNA) and allows the extraction of metadata from a wide range of data formats. Although it can be run interactively, the project uses batch files to allow a consistent generation of output.

The project uses a restricted set of these commands:

CommandExamplesExplanation
-aguk.cls.mcs, uk.alspacAgency
-renamealspac_00_ayc nshd_46_tcsThis should be the name of the bundle with which the data is associated
-ddiAlways 3.2-RPDDI Version - this should not be changed
-ddipdAlways proprietaryOnly output proprietary format not ascii
-har“No options”This creates unified codelists i.e. single Yes/No
-ddilangAlways en-GBThis should stay as en-GB as this is the default setting we will be using
-ddirefAlways URNInternal ddi URN definition
-ddiurnAlways canonicalCanonical - this should not be changed
-pretty“No options”This is so it looks half decent if you look at by hand
-statsmin, max, valid, invalidDescription of statistics generated per variable Optional: stddev and freq
-optAlways fullOptimised output
-scan“No options”Outputs metadata or entire full and includes no of cases and variables

../bcs70/bcs_1970.savName and path of input data file This is always the last line

...

sledgehammer-cl.bat" ^
-ag uk.cls.bcs70        ^
-rename bcs_75_mcs ^
-ddi 3.2-RP               ^
-ddipd proprietary ^
-har                ^
-ddilang en-GB        ^
-ddiref urn   ^
-ddiurn canonical ^
-pretty      ^
-opt full ^
-scan ^
-stats max,min,mean,mode,valid,invalid,freq,stdev ^
../bcs70/bcs_1975/bcs_1975_masc.sav

Metadata Edits (DDI flavour)

For display purposes and for ease of navigation and ingest, a consistent set of names should be applied to the output from SledgeHammer prior to ingest through a series of edit scripts. These are written in python, and if they cannot be run at the study, can be run at CLOSER prior to ingest.

Edit scriptExplanation
fandr.pyInsert <r:String> where absent from output
fandr2.pyNames the DDI Instance
fandr3.pyNames the Physical Instance
fandr4.pyNames the Logical Product
fandr5.pyNames the Code List scheme
fandr6.pyNames the Data Product Name
fandr7.pyAdd Dataset URI and whether public
fandr8.pyAdds Title and Alternate Title to DDI Instance
fandr9.pyCorrects Valid to be ValidCases
fandr10.pyCorrects Invalid to be InvalidCases
fandr11.pyAdds naming to DataRelationship

The scripts uses a tab delimited file called rename_list.txt which includes the following:

Short Name - is the name of the metadata bundle with which the dataset of associated with

Long Name - the name you want to display as a human readable description

DOI - if available, this allows the user to navigate to the DOI and relevant citation and is provided for the user. If this is not available a website address of where the data can be accessed can be used instead.

Public - 1 is to be used when an website address or DOI has been provided. 

...

Short nameLong NameDOIPublic
mcs_03_naMCS2 Neighbourhood Assessmenthttp://dx.doi.org/10.5255/UKDA-SN-5350-31
mcs4_teacherMCS4 Teacher Surveyhttp://dx.doi.org/10.5255/UKDA-SN-6848-11
mcs5_scMCS5 Child Paper Self-Completionhttp://dx.doi.org/10.5255/UKDA-SN-7464-21
mcs5_teacherMCS5 Teacher Surveyhttp://dx.doi.org/10.5255/UKDA-SN-7464-21
ncds8_scNCDS8 Paper Self-Completionhttp://dx.doi.org/10.5255/UKDA-SN-6137-21
pmsPerinatal Mortality Studyhttp://dx.doi.org/10.5255/UKDA-SN-5565-21

Control File

A control file can be used to batch up the batch files and then run the edits across all the files:

...

If the edits are run, the file can be imported into Colectica Designer to check that it is well formed. Alternartivley Alternatively you can check that DDI-Flavour has been run by checking the XML has the new title at the beginning of the file. 

Please see the step by step guide to what the setup requirements are and for examples. 

1. Setup Sledgehammer

  • Run the SledgeHammerInstaller.exe to install the software- usually on C: drive programs. Must have admin permissions.
  • Run the software Sledgehammer.jar or from shortcut and select the license. Go to settings > Licensing > browse to the location of the license i.e. JonJohnson.UCL.ent20.lic

2. Setup Python

...

3. Setup DDI-Flavour

4. Install Java runtime

5. Prepare SPSS files

  • Only run Sledgehammer on SPSS files.
  • SPSS will hold lots of hidden information, Sledgehammer will try to use this and can lead to issues when outputting the DDI-L XML. We would recommend using something like this to get rid of this extraneous information This replaces a file label (often the location of the original file) with the bundle name, and to drop any document(s):

get file="G:\DB\closer_data\bcs70\bcs_1975\bcs_1975_masc.sav".
FILE LABEL "bcs_75_msc".
DROP DOCUMENT.
EXECUTE.
sysfile info file="G:\DB\closer_data\bcs70\bcs_1975\bcs_1975_masc.sav".
save outfile="G:\DB\closer_data\bcs70\bcs_1975\bcs_1975_masc.sav".

 6. Prepare batch files

7. Run Sledgehammer and DDI-Flavour

...