OECD PIAAC Study 2013 data available for analysis

Data Analysis | R | Tableau

The recent PIAAC study of OECD covers skills of adults in 24 countries. You can find the study results together with background information here: http://www.oecd.org/site/piaac/. The study is very interesting as it is the first study targeting adult skills for these countries. Fortunately OECD has made the case data of the study available for download under the ‘public use files’ section. Here is the direct link to the files: http://vs-web-fs-1.oecd.org/piaac/puf-data

Data is available in SPSS and SAS format. I went for the SPSS format, downloaded all files (all countries except Australia) and processed the files using R. Here is the script I created for this purpose:

# OECD PIAAC Study 2013
# Interesting links:
#  SPSS files: http://vs-web-fs-1.oecd.org/piaac/puf-data


file_list <- list.files(path=”.”)
for (file in file_list){ 
  if (!exists(“piaac”)){
    piaac <- read.spss(file, to.data.frame=T)
  if (exists(“piaac”)){
    temp_dataset <-read.spss(file, to.data.frame=T)
    piaac<-rbind(piaac, temp_dataset)

# only include this set of variables:

# also include all score columns (10 per domain)
collist_lit<-names(piaac)[str_sub(names(piaac),start=1, end=5)==”PVLIT”]
collist_num<-names(piaac)[str_sub(names(piaac),start=1, end=5)==”PVNUM”]
collist_psl<-names(piaac)[str_sub(names(piaac),start=1, end=5)==”PVPSL”]

# combine score values to single values
p<-data.frame(piaac2[collist_meta], “Score_Lit”=rowMeans(piaac2[collist_lit]), “Score_Num”=rowMeans(piaac2[collist_num]), “Score_Psl”=rowMeans(piaac2[collist_psl]))

# remove incomplete cases
p<-melt(p, id=collist_meta)
p<-p[!is.na(p$value) & !is.na(p$CNTRYID_E),]

# write output to csv-file
write.table(p, col.names=T, row.names=F, file=”c:\\temp\\piaac.csv”, sep=”,”, dec=”.”, na=””, qmethod=”double”)

Or course, we could include additional variables, but for the moment, let’s focus on the variables from the script above. A full list with all variables and details is available using the “International Codebook” link in the “public use files” section.

The values can then be imported to Tableau desktop for further analysis.


The official charts are looking more dramatic (click here for an example) as the value axis does not start with zero. This is a common method of making results look more impressive or – as Prof. Hichert says – it adds a certain “lie-factor” to the visualization where it’s up to the creator of the chart to scale this factor to whatever effect is desired.

A result that somewhat surprises me, was that the skills decrease with higher age, as shown below for the reading skill (Lit) in Germany:


Here is the map visualization (from red over grey to green, again for the reading skills) for the European region as Tableau automatically maps country names to geographic regions.


Another interesting chart shows the relationship between income (shown as quantile here) and test score (again for the reading skills, showing some selected countries):


And finally the comparison between highest education (according to the International Standard Classification of Education) and score (again shown for reading skills) for Germany:


I’m not going into interpretations here. There is a lot of material available on the OECD website. The study is also a great source for demographic data. So if you like to discover more maybe the R script above can help you getting started.

This entry was posted in Allgemein and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s