Dr. Alex Dong's Blog

SAS Series 3: Work with SAS data sets

Overview

Before you can work with your data in SAS, it must be in a special form called a SAS data set. So understanding SAS data sets is the first step in learning about SAS programming.

Conceptually, a SAS data set (also called a table) is a file containing descriptor information and related data values. The file is organized as a table of observations (rows) and variables (columns) that SAS can process. Some SAS data sets also contain an index, which enables SAS to locate records in the data set.

1

In some special cases, such as using SAS/ACCESS to read database management system files directly, your SAS data set may contain only the logic for accessing the data, not the data itself. But for this tutorial, we’ll assume that SAS data sets contain data.

To work with SAS data sets, you also need to understand how they are stored. All SAS files are stored in a SAS library, which is a collection of files such as SAS data sets and catalogs. In the Windows and Unix environments, a SAS library is typically a group of SAS files in the same folder or directory.

In some operating environments, a SAS library is a physical collection of files. In others, the files are only logically related.

2

To access a library, you assign it a name (also known as a libref, or library reference). You can think of library names as nicknames or shortcuts that you use to identify libraries during a SAS session.

View files in a SAS library

SAS assigns three libraries automatically each time you start SAS. In this task, you learn about these libraries and view file types in the Sashelp library.

  1. In the Explorer window, double-click Libraries. Notice that there are three libraries. These libraries are automatically assigned each time you start SAS:

    Sashelp
           a permanent library that contains sample data and other files that control how SAS works at your site. This is a read-only library.

    Sasuser
           a permanent library that contains SAS files in the Profile catalog that store your personal settings. This is also a convenient place to store your own files.

    Work
            a temporary library for files that do not need to be saved from session to session.
    3
  2. Double-click the Sashelp library.
  3. Scroll the Explorer window and notice that there are several types of files, or members, in the library. These icons represent the most common SAS file types:4

Assign a library

When you define a library, you indicate the location of your SAS files to SAS. Once you define a library, you can manage SAS files within it. In this task, you use the New Library window to assign a library to use in this quick-start guide.

  1. On the toolbar, click the New Library tool (5  ). The New Library window opens.
  2. In the Name box, type MyLib.
    Library names

    • are limited to 8 characters
    • must start with a letter or underscore
    • can contain only letters, numerals, or underscores.
  3. Select the Enable at Startup check box. This library will be automatically assigned each time you start a SAS session.
  4. Click Browse. Select the default location or select another location in your operating environment. Any files that you save to the Mylib library will be saved in the directory or folder that you designate in the Path box. Click OK.

6

5. Click OK to close the New Library window.

You can delete SAS libraries. When you delete a SAS library, SAS no longer has access to the directory. However, the contents of the library still exist in your operating environment.

Add and rename a SAS data set

Now that you have a library, you can add a SAS data set to the library.

  1. With the Explorer window active, select View-> Show Tree. The libraries are displayed in the left pane of the window.
  2. Click the Sashelp library.
  3. Drag the Prdsale data set from the right pane and drop it into the Mylib library on the left.7
  4. Click MyLib. Notice that Prdsale has been copied there.
  5. Right-click Prdsale and select Rename. Type ProductSales for the new name and click OK.
    SAS data set names must

    • be 1 to 32 characters in length
    • begin with a letter (A-Z, including mixed case characters) or an underscore (_)
    • continue with any combination of numbers, letters, or underscores.

Open a SAS data set

Now that you’ve copied and renamed a data set, view the data that it contains.

There are many ways to get your data into a SAS data set. In general, you can

  • enter data directly into a SAS data set using the VIEWTABLE window
  • read raw data into a SAS data set using the Import wizard or SAS programming statements
  • read and modify existing data sets using SAS programming statements
  • convert other vendors’ data files into SAS data sets using SAS/ACCESS
  • read other vendor’s data directly using SAS/ACCESS.
  1. In the Explorer window, double-click the ProductSales table in the Mylib library. The table opens in the VIEWTABLE window.

82. Scroll the VIEWTABLE window and notice that there are 1440 rows (also called observations) and 10 columns (also called variables).

View general data properties

The descriptor portion of a SAS data set contains information about the data set, including

  • the name of the data set
  • the date and time the data set was created
  • the number of observations
  • the number of variables.

You can see this information by viewing the general properties of a data set.

  1. In the Explorer window, right-click the ProductSales table and select Properties.9
  2. In the General tab, view the data set’s properties. (Don’t close the window yet. You’ll need it for the next step.)

View column attributes

In addition to general information about the data set, the descriptor portion contains attribute information for each variable in the data set. The attribute information includes the variable’s name, type, length, format, informat, and label.

  1. In the Mylib.ProductSales Properties window, click the Columns tab. In the Column Name column, notice that all the variables for the data set are listed along with a symbol that indicates the variable’s type.10
    Variable (column) names must

    • be 1 to 32 characters in length
    • begin with a letter (A-Z, including mixed case characters) or an underscore (_)
    • continue with any combination of numbers, letters, or underscores.

    SAS is case insensitive for variable names, but remembers the first occurrence of the variable and writes it that way in output.

  2. Next, look at the Label column. A label is descriptive text up to 256 characters. Labels are used instead of variable names in some reports and for the column headings in the VIEWTABLE window.
  3. Now look at the Type column. There are only two types of variables in SAS: character and numeric. Character variables are listed as Text in the type column, and numeric variables are listed as Number in the Type column. The Length attribute is related to the variable’s type.
    Character variables

    • can contain any values
    • use a blank to represent missing values
    • can be up to 32K long.

    Numeric variables

    • can contain only numeric values (the digits 0 through 9, +, -, ., and E for scientific notation).
    • use a single period (.) to represent missing values.
    • have a default length of 8. Numeric values (no matter how many digits they contain) are stored as floating point numbers in 8 bytes of storage, unless you specify another length.

Change a column format

Formats are variable attributes that affect the way data values are written. SAS offers a variety of formats for numeric and character data. In this task you change the format of a variable.

  1. In the Mylib.Productsales Properties window, look at the Format column. The ACTUAL variable uses the DOLLAR12.2 format. This format displays the value 12345 as $12,345.00 in a report.
    In general, SAS formats have

    • a name
    • a w value, which specifies the width that is used for displaying the value
    • a period following the w value.

    Numeric SAS formats, such as the DOLLARw.d format, can also specify a d value, which is the number of decimal places to be displayed.

  2. Open the ProductSales table if it is not already open. Right-click the Actual Sales column heading and select Column Attributes from the shortcut menu.11
  3. In the Column Attributes window, click the selector next to the Format box.12
  4. In the Format window, view the available formats and their descriptions. When you select a format in this list, an example is displayed in the Format Details area.Let’s say that you want to drop the decimal places for the values of the ACTUAL variable. Change the value in the Decimal box to 0. Click OK.
  5. In the Column Attributes window, click Apply. Then click Close.
  6. View the ProductSales table and notice that the values in the Actual Sales column no longer contain decimal places.13

Understand informats

An informat (input format) is the instruction that specifies how SAS reads raw data. SAS provides many informats for reading standard and nonstandard data values.

  1. In the ProductSales table, right-click the Predicted Sales column heading and select Column Attributes.
  2. In the Column Attributes window, notice that the PREDICT column uses the 12. informat. This is the W.D informat with a width of 12 and decimal places of 0. This informat reads standard numeric values.
  3. Click the selector button next to the Informat box. Scroll the list to see the informats that are available for reading numeric data.14In the Informat window, click Cancel. In the Column Attributes window, click Close.
  4. In the ProductSales table, right-click the Country column heading and select Column Attributes.
  5. In the Column Attributes window, click the selector button next to the informat box. In the Informats window, notice that the informats for reading character values begin with a dollar sign ($). Scroll the list of character informats and their descriptions to see what is available.14
Remember that

  • Each informat contains a w value to indicate the width of the raw data field.
  • Each informat also contains a period, which is a required delimiter.
  • For some informats, the optional d value specifies the number of implied decimal places.
  • Informats for reading character data always begin with the dollar sign ($).
  • Click Cancel. Then close the Column Attributes window.
  • Close the VIEWTABLE window.Now you’ve seen both the data and descriptor portions of the ProductSales data set. In the next task, you’ll learn about writing SAS programs.

adminSAS Series 3: Work with SAS data sets