If you work with data, you work with variables. However, sometimes your SAS dataset doesn’t contain the variable you need.
So, how to create a new variable in SAS? In SAS, you create a new variable in the Data Step. First, you write down the name of your new variable, followed by an equal sign, and the definition of the variable. In this article, we discuss how to create three types of variables in SAS (numeric, character, and date) based on existing variables.
In this section, we discuss how to create a numeric variable.
A constant numeric variable is a variable that has the same value for each observation in the dataset. Examples are 1, 50, or 200. You create a constant numeric variable by writing down its name, an equal sign, its value, and a semicolon.
DATA work.output_data; SET work.input_data; my_numeric_var1 = 1; my_numeric_var2 = 50; my_numeric_var3 = 200; RUN;
You can also create a constant numeric variable using an arithmetic operation, like my_numeric_var = 2 + 3. SAS provides the following list of arithmetic operations.
DATA work.output_data; SET work.input_data; my_numeric_var1 = 2 + 2; my_numeric_var2 = 10 - 5; my_numeric_var3 = 3 * 5; my_numeric_var4 = 100 / 10; my_numeric_var5 = 2 ** 3; RUN;
Additionally, you can define a numeric variable without a value yet. You simply write down the name of the variable, an equal sign, and a period. Variables without values are called missing values.
DATA work.output_data; SET work.input_data; my_numeric_var1 = .; RUN;
If you have missing values in your dataset and want to replace them, check out this article.
Although it’s sometimes necessary to create a constant numeric variable, normally you create a numeric variable based on one or more existing variables, i.e., columns. Again, you first write down the name of the variable, followed by an equal sign, and its definition. The definition, in this case, is normally a SAS function that returns a numeric value with column names as its arguments.
DATA work.output_data; SET work.input_data; my_numeric_var1 = function(argument1 ); RUN;
There is a long list of SAS functions that return a numeric value. Here we show the most common ones.
Below we show some of these functions in SAS code.
DATA work.output_data; SET work.input_data; my_numeric_var_sum = SUM(column1, column2, column3); my_numeric_var_max = MAX(column1, column2, column3); my_numeric_var_mean = MEAN(column1, column2, column3); my_numeric_var_abs = ABS(column1); my_numeric_var_round = ROUND(column1, 0.01); my_numeric_var_exp = EXP(column1); RUN;
Note that for some functions, like SUM() and MEAN(), the number of columns (arguments) can be very large. If all arguments have a similar name, you can use the colon (:) modifier to write more efficient code.
A character variable is string that contains at least one (special) character. In this section, we discuss how to create a new character variable in SAS.
Like constant numeric variables, there also exist constant character variables. You create a constant character variable by writing down its name, the equal sign, and the definition. You write the definition between (double) quotes. Here are some examples.
DATA work.output_data; SET work.input_data; my_char_var1 = "Hello"; my_char_var2 = "Mr. Smith"; my_char_var3 = "Y"; RUN;
If you write a number between double quotes, like “10”, then SAS interprets this value as a character string.
You can also create a character variable based on existing variables (columns). To create a variable based on existing variable, you write down the name of your new variable, followed by the equal sign, and a function that returns a character string. The arguments of this function are the existing variables.
SAS provides a long list of predefined functions that return a character string. Here we have listed the most common functions.
Below are some example of how to create a new character variable based on existing variables.
DATA work.output_data; SET work.input_data; my_char_lowcase = LOWCASE(column1); my_char_upcase = UPCASE(column1); my_char_compress = COMPRESS(column1); my_char_strip = STRIP(column1); my_char_tranwrd = TRANWRD(column1, "Avenue", "Av."); my_char_substr = SUBSTR(column1, 1, 4); my_char_catx = CATX("/", column1, column2, column3); RUN;
In this section, we discuss how to create a new Date variable. A SAS date is stored as the number of days since January 1st, 1960. So, for example, December 7th, 2020, is stored as 22.256.
You can create a SAS date variable by defining its name, an equal sign, and the date. However, if you define the date as “12/7/2020”, then SAS interprets this as a character variable. So, to create an actual date variable you have to define the variable as the numerical equivalent of “12/7/2020” (i.e., 22256), or as “7DEC2020″D. If you use the letter “D” after the date (written in DATE9. format), then SAS interprets your input as a date.
DATA work.output_data; SET work.input_data; my_date_var = "7DEC2020"D; format my_date_var DATE9.; RUN;
If you want to create a Date Time variable, then the correct syntax is “7DEC2020:0:0:0″DT. The letters “D” and “DT” make your code more readable.
You can also create a new date variable based on other, existing variables. SAS provides many Date (or Date Time) functions. Here we show the most important.
You can apply these functions the following way to create a new date variable.
DATA work.output_data; SET work.input_data; my_date_date = DATE(); my_date_datetime = DATETIME(); my_date_datepart = DATEPART(column1); my_date_mdy = MDY(column1, column2, column3); my_date_dhms = DHMS(column1, 0, 0, 0); my_date_intck = INTCK("MONTH", column1, column2); /* Difference in Months between column1 and column2 */ my_date_intnx = INTNX("DAY", column1, 2); /* Add 2 days to the date in column1 */ RUN;
Don’t forget to apply the correct format after defining the variable. Otherwise, SAS will show the numeric representation of the date (time).
So far, we have discussed how to create new variables of different types in a SAS Data Step. However, you can also create a new variable with the PROC SQL procedure. Again, the variable can be a constant or based on an existing variable. The only difference between creating a new variable in a Data Step and with PROC SQL is how you define the new variable. The functions you can use are the same.
With a PROC SQL procedure you first define the value of your variable, then you write the AS keyword, followed by the name of your new variable. After the name of the variable, you can define the variable’s format.
PROC SQL; CREATE TABLE work.output_data AS SELECT *, 100 AS my_num_var1, ROUND(column1, 0.1) AS my_num_var2, "Hello" AS my_char_var1, CATX("/", column1, column2) AS my_char_var2, "07DEC2020"D AS my_date_var1 FORMAT=DATE9., MDY(12,7,2020) AS my_date_var2 FORMAT=DATE9. FROM work.input_data; QUIT;
You can also create a new variable on a condition with an IF statement (Data Step) or CASE statement (PROC SQL). This means that the value of your variable depends on other variables. Using these conditional statements gives you more flexibility. Below we provide a simple example.
SAS Data Step
DATA work.output_data; SET work.input_date; IF column1 > 0 THEN my_new_var = "Positive"; ELSE IF column1 < 0 THEN my_new_var = "Negative"; ELSE my_new_var = "Undefined"; RUN;
PROC SQL
PROC SQL; CREATE TABLE work.output_data AS SELECT * CASE WHEN column1 > 0 THEN "Positive" WHEN column1 < 0 THEN "Negative" ELSE "Undefined" END AS my_new_var; QUIT;