BASE SAS INTERVIEW QUESTIONS


Most Important Frequently Asked Base Sas Interview Questions



    1. Question 1. What Are The Three Types Of Join?

      Answer :

      The three types of join are inner join, left join and right join. The inner join option takes the matching values from both the tables by the ON option. The left join selects all the variables from the first table and joins second table to it. The right join selects all the variables of table b first and join the table a to it.

    2. Question 2. Have You Ever Used Proc Sql For Data Stigmatization?

      Answer :

      Yes I have used it for summarization at times…For e.g if I have to calculate the max value of BP for patients 101 102 and 103 then I use the max (bpd) function to get the maximum value and use group by statement to group the patients accordingly.

    3. Question 3. Tell Me About Your Sql Experience?

      Answer :

      I have used the SAS/ACCESS SQL pass thru facility for connection with external databases and importing tables from them and also Microsoft access and excel files.Besides this, lot of times I have used PROC SQL for joining tables.

    4. Question 4. Once You Have Had The Data Read Into Sas Data Sets Are You More Of A Data Step Programmer Or A Proc Sql Programmer?

      Answer :

      It depends on what types of analysis datasets are required for creating tables but I am more of a data step programmer as it gives me more flexibility. For e.g creating a change from baseline data set for blood pressure sometimes I have to retain certain values …use arrays ….or use the first. -and last. variables.

    5. Question 5. What Types Of Programming Tasks Do You Use Proc Sql For Versus The Data Step?

      Answer :

      Proc SQL is very convenient for performing table joins compared to a data step merge as it does not require the key columns to be sorted prior to join. A data step is more suitable for sequential observation-by-observation processing.PROC SQL can save a great deal of time if u want to filter the variables while selecting or u can modify them …apply format….creating new variables , macro variables…as well as sub setting the data.PROC SQL offers great flexibility for joining tables.

    6. Question 6. Have U Ever Used Proc Sql To Read In A Raw Data File?

      Answer :

      No. I don’t think it can be used.

    7. Question 7. How Do You Merge Data In Proc Sql?

      Answer :

      The three types of join are inner join, left join and right join. The inner join option takes the matching values from both the tables by the ON option. The left join selects all the variables from the first table and joins second table to it. The right join selects all the variables of table b first and join the table a to it.

      PROC SQL;

      CREATE TABLE BOTH AS

      SELECT A.PATIENT,

      A.DATE FORMAT=DATE7. AS DATE,

      A.PULSE,B.MED, B.DOSES,

      B.AMT FORMAT=4.1

      FROM VITALS A INNER JOIN DOSING B

      ON (A.PATIENT = B.PATIENT)

      AND(A.DATE = B.DATE)

      ORDER BY PATIENT, DATE;

      QUIT;

    8. Question 8. Why And When Do You Use Proc Sql?

      Answer :

      Proc SQL is very convenient for performing table joins compared to a data step merge as it does not require the key columns to be sorted prior to join. A data step is more suitable for sequential observation-by-observation processing.

      PROC SQL can save a great deal of time if u want to filter the variables while selecting or we can modify them, apply format and creating new variables, macro variables. as well as subsetting the data. PROC SQL offers great flexibility for joining tables.

    9. Question 9. Difference Between Proc Means & Proc Summary?

      Answer :

      • Proc Summary defaults to NOPRINT Proc Means defaults to PRINT
      • If you omit the VAR statement, then PROC SUMMARY produces a simple count of observations, whereas PROC MEANS tries to analyze all the numeric variables that are not listed in the other statements
      • If you specify statistics on the PROC SUMMARY statement and the VAR statement is omitted, then PROC SUMMARY stops processing and an error message is written to the SAS log.
      • If you omit the VAR statement, then PROC MEANS analyzes all numeric variables that are not listed in the other statements. When all variables are character variables, PROC MEANS produces a simple count of observations.

    10. Question 10. How Will You Write A Code To Create A Sas Data Set With No Data In It/ Or How To Validate A Data Set Without Actual Creating It?

      Answer :

      1)Use OPTIONS OBS = 0
      2) Creating a data set by using the like clause.
       ex: proc sql;
      create table latha.emp like oracle.emp;
      quit;
      In this the like clause triggers the existing table structure to be copied to the new table. using this method 
      result in the creation of an empty table.
      3)  data check;
                 set _null_;
           run;
      4) data test;
              delete;
          run;

    11. Question 11. Do You Use Proc Report Or Proc Tabulate? Which Do You Prefer?

      Answer :

      I prefer Proc report as it is highly customizable and flexible where I can define each column in whatever way I  want to and even make use of SAS functions, logic processing, and assignment statements and create new. 

      Variables for report making use of the compute block of proc report. I referred proc report because it is more efficient tool than tabulate because with report we can do means frequency and tabulate also.

    12. Question 12. How Will You Read An Input File Into Sas?

      Answer :

      Use the statements INFILE to point to the file reference(should be defined using Filename) / to the file path. Use INPUT statement to read the data into the sas dataset.

    13. Question 13. How Would You Remove A Format That Has Been Permanently Associated With A Variable?

      Answer :

      we can remove the format by using proc datasets:

      Proc datasets;

      modify <data set name>;

      format <variable name>(which variable format needs to 

      modify>;

      run;

      quit;

    14. Question 14. How To Display Duplicate Observations In A Data Using Base Sas?

      Answer :

      There are two ways to display duplicate observations:

      1. In data step, using first.var and last.var
      2. Using proc sort with option dupout option.

    15. Question 15. In The Flow Of Data Step Processing, What Is The First Action In A Typical Data Step?

      Answer :

      1)Compilation Phase:

      When you submit a DATA step, it reads the input statements it creates an input buffer and brings the variables and observations. it is a logical memory area and pdv brings the observations at a time from input buffer 

      and checks the errors. PDV contains 2 automatic variables _n_ & _error_, these checks the errors in observations.

      _n_: indicates the no of obs.

      _error_: 1 if error occured

                   0 if no error

      After that it assigns the data values to appropriate variable and builds a sas dataset.

    16. Question 16. Does Sas ?translate? (compile) Or Does It ?interpret?

      Answer :

      A typical SAS program could contain DATA steps, PROC steps and macros. Macros are preprocessed. DATA steps are just in time compiled. PROC steps are interpreted in the order they appear in program. So when we submit a SAS program consisting of all these three components, the macro is compiled and executed first. If a DATA step is encountered, then it is compiled and executed. Note that the DATA step will not be executed if there is an error in the compilation. If a PROC step is encountered, it is interpreted and executed line by line. However i am not certain on this PROC step behavior.

    17. Question 17. At Compile Time When A Sas Data Set Is Read, What Items Are Created?

      Answer :

      Only PDV is generated at compilation time, whereas the automatic variables _N_ , _ERROR_ are generated at execution time only.

    18. Question 18. How Do U Validate Sas Program?

      Answer :

      When a sas code is submitted, SAS performs syntactical checks before executing the program/ code. In that case, one of the ways could be - at the beginning of the code, write OPTIONS OBS=0 in addition to other options and then RUN it. 

      This way data will not be processed and the log shows error messages/ warnings, if any. 

      If you are executing the SAS code on PC SAS, the highlighted colors itself shows the syntactical errors, if any.

    19. Question 19. Name And Describe Three Sas Functions That You Have Used, If Any?

      Answer :

      1. Compress - sas function which is used to remove spaces in string value and concatenate two values Without spaces.
      2. Input is another function of sas, it is one of conversion function in sas. It converts numeric into char.
      3. Put is another conversion function in sas. It converts char to numeric.
      4. SAS functions can be used to convert data and manipulate character variable values.

      different types of functions:

      1. TRIM
      2. SUBSTR
      3. ABS
      4. SCAN

      TRIM : Removing the trailing blanks from character expressions. 

                     syntax=trim(argument)

      SUBSTR: substr extracts the substring from an argument

                     syntax=substr(argument, position<,n>)

      Abs: Returns the absolute of the argument

                      Syntax=abs(argument)

      The most common functions that would be used are-

      Conversion functions - Input / Put / int / ceil / floor

      Character functions - Scan / substr / index / Left / trim / compress / cat / catx / upcase,lowcase

      Arithmetic functions - Sum / abs /

      Attribute info functions – Attrn / length

      Dataset – open / close / exist

      Directory  - dexist / dopen / dclose / dcreate / dinfo

      File functions – fexist / fopen/ filename / fileref

      SQL functions – coalesce / count / sum/ mean

      Date functions – date / today / datdif / datepart / datetime / intck / mdy

      Array functions – dim

    20. Question 20. When Looking For Data Contained In A Character String Of 150 Bytes, Which Function Is The Best To Locate That Data: Scan, Index, Or Index C?

      Answer :

      INDEX: Searches a character expression for a string of characters, and returns the position of the string's first character for the first occurrence of the string.

      INDEX (source, excerpt)

      it returns the position where the 2nd field is in the source

      Eg:  

           str1 = 'Hi i am fine here. how are u there ? .....';

           str2 = index(str1,'how'); --> str2 = 20

      SCAN is to get a substring upto mentioned character. Scan function the best for locating the particular word specified in a argument,scan function default length is 200bytes.

      data k;

      r='ganesh kumar';

      u=scan(r,2);

      proc print;

      run;

      result:  kumar

      INDEXC to locate only for mentioned one or more single character

              INDEXC(character-value, 'char1','char2','char3', ..)

      FIND: Searches for a specific substring of characters within a character string

       FIND(string, substring<,modifiers><,startups>)

      it returns the position the substring is in

      FINDC: To locate a character that appears or does not appear within a string. used to search for any one in a list of character values

      INDEXW: Searches for the substring as a word that follows a space in the provided sentence

      eg;string1= "there is a the here" ;

      INDEXW(STRING1,"the")   

      result: 12 (the word "the")

    21. Question 21. If You Have A Data Set That Contains 100 Variables, But You Need Only Five Of Those, What Is The Code To Force Sas To Use Only That Variable?

      Answer :

      Use KEEP option on a dataset to only select few variables from 100 variables. We can use KEEP option either on set statement or data step statement. If we use on SET statement then only the five variables are created on pdv and only these variables are sent to the output dataset. If we use KEEP option on data step statement then all the variables are copied into pdv and after any manipulation only the selected variables on data step statement are processed and sent to output dataset.

      data abc;

        set xyz (keep= ab cd ef gh);

      run;

      data abc(keep= ab cd ef gh ij);

        set xyz;

       ij=ab+jk;

      run;

    22. Question 22. What Is The Purpose Of The Trailing And How Would You Use Them?

      Answer :

      If the data is continuosly in data set SAS would read the first words only from each line in the `datelines' block and it will ignore the rest of the line. if we use Trailing @@'it will read completly.and another type of trailing is using single @ this is a line hold specifier.

      • Trailing @ is used to hold the record in input buffer to execute another input statement on the same datelines.
      • Trailing @@ is used to hold the record in input buffer to execute same input statement on same datelines intel eof record.
      • The trailing @ or more technically, line hold specifiers are used to hold the pointer in the same record for multiple iterations. 
      • The two tyoes of line hold specifiers are single trailing(@) and double trailing(@@).
      • The single trailing hold the record until it encounters either another input statement or end of the data step.  

      They are used for the records such as 

      001F38   H

      002 F 40 G

      To read these values to the data step

      Data example:

        input @10 type $ @;

        if type='H' then

          input @1 id 3. @4 gender $1. @5 age2.;

        else if type='G' then

          input @1 id3. @5 gender $1. @7 age 2.;

        end;

      cards;

      001F38   H

      002 F 40 G

      ;

      run;

      The double trailing holds the until the end of the record.

      Data example2:

        input id age @@;

      cards;

      001 23 002 43 003 65 004 32 005 54

      ;

      run;

    23. Question 23. What Is The Significance Of The ‘of’ In X=sum (of A1-a4, A6, A9);?

      Answer :

      It is use to tell sas to consider a set of values to be processed. In the above example, SUM(OF a1-a4,a6,a9) resolves to SUM(A1,A2,A3,A4,A6,A9).

      If we dont use 'OF' then it would be treated as a minus sign.. A1(Minus)-A4 and that is not what we are trying to accomplish.

    24. Question 24. How Do You Test For Missing Values?

      Answer :

      NMISS OPTION is used for missing values.

    25. Question 25. How Would You Create Multiple Observations From A Single Observation?

      Answer :

      line pointer is used for multiple lines per observation

        @@ is used for multiple observations per line

    26. Question 26. What Are Some Good Sas Programming Practices For Processing Very Large Data Sets?

      Answer :

      Arrays is used for processing for large data set.

    27. Question 27. Why Is Sas Considered Self-documenting?

      Answer :

      When data set is created sas create descriptor portion and data portion .that means sas stores the information like variable name ,length,type etc.

    28. Question 28. How Does Sas Handle Missing Values In: Assignment Statements, Functions, A Merge, An Update, Sort Order, Formats, Procs?

      Answer :

      When you check for ordinary missing numeric values, you can use code that is similar to the following: 

      if numvar=. then do;

      If your data contains special missing values, you can check for either an ordinary or special missing value with a statement that is similar to the following: 

      if numvar<=.z then do;

      To check for a missing character value, you can use a statement that is similar to the following: 

      if charvar=' ' then do;

      The MISSING function enables you to check for either a character or numeric missing value, as in: 

      if missing(var) then do;



Topic: Base Sas Interview Questions

No comments:

Post a Comment