Reading Excel files

Here is a snapshot of a portion of an Excel spreadsheet:

Note that row 1 of the Excel spreadsheet is a header row containing variable names for the columns.

When working with Excel spreadsheets or delimited text files, use row 1 of the file to supply variable names that you intend to use in SPSS Statistics.

IBM SPSS Statistics can directly read an Excel sheet. There are different implementations in different recent releases of IBM SPSS Statistics but, in general, the capability exists on the File menu. In IBM SPSS Statistics 24, use the following path:

File | Import Data

Here is the Read Excel File dialog box:

By default, IBM SPSS Statistics shows the entire range of data that it encounters in the Excel sheet. You can use the Range portion of the dialog box to specify a subset range. Also by default, IBM SPSS Statistics expects to find variable names in the first row of data. Additional checkboxes exist, but the default settings will work for the Excel file that we are analyzing. Finally, click on OK to read the file, or click on Paste to paste the constructed syntax to the syntax window.

Here is the pasted syntax (IBM SPSS Statistics version 24):

GET DATA
  /TYPE=XLSX
  /FILE='C:\Users\Tony\Documents\KSBSPSSBOOK_DATA\chapter2\gss2014\gss2014extract.xlsx'                                                                                                        
  /SHEET=name 'gss2014extract'
  /CELLRANGE=FULL
  /READNAMES=ON
  /DATATYPEMIN PERCENTAGE=95.0
  /HIDDEN IGNORE=YES.
EXECUTE.
DATASET NAME DataSet1 WINDOW=FRONT.

The subcommands have a close correspondence with the dialog box settings in the Read Excel File dialog box. Note that the command is the GET DATA command, and the /TYPE subcommand specifies that the input file is an XLSX file. Note that the slash (/) is used to separate subcommands. The /SHEET subcommand points to the particular worksheet. /CELLRANGE tells IBM SPSS Statistics to read the full range of data. /READNAMES tells IBM SPSS Statistics that the first row is a header row.

The EXECUTE command in the above pasted syntax reads the active dataset and causes execution of any pending commands.

Finally, the DATASET NAME command assigns the name Dataset1 to the active dataset and brings the SPSS Statistics Data Editor window to the front.

Running the indicated syntax populates an IBM SPSS Statistics Data Editor window with the data, as displayed in the following screenshot:

IBM SPSS Statistics uses the variable names in the header row with one minor change--the variable name ID_ in the header row of the Excel sheet is automatically changed to ID. This is covered in the section with the rules to name variables.