22 July 2014

Note to self: brew cleanup r

Note to self: after updating R with Homebrew remember to cleanup old versions:

brew cleanup r
Otherwise I'm liable to get a segfault. (see also)

28 June 2014

Simple script from setting up R, Git, and Jags on Amazon EC2 Ubuntu Instance

Just wanted to put up the script I've been using to create an Amazon EC2 Ubuntu instance for running RStudio, Git, and Jags. There isn't anything really new in here, but it it has been serving me well.

The script begins after the basic instance has been set up in the Amazon EC2 console (yhat has a nice post on how to do this, though some of their screenshots are a little old). Just SSH into the instance and get started.

11 May 2014

Updates to repmis: caching downloaded data and Excel data downloading

Over the past few months I’ve added a few improvements to the repmis–miscellaneous functions for reproducible research–R package. I just want to briefly highlight two of them:

  • Caching downloaded data sets.

  • source_XlsxData for downloading data in Excel formatted files.

Both of these capabilities are in repmis version 0.2.9 and greater.

Caching

When working with data sourced directly from the internet, it can be time consuming (and make the data hoster angry) to repeatedly download the data. So, repmis’s source functions (source_data, source_DropboxData, and source_XlsxData) can now cache a downloaded data set by setting the argument cache = TRUE. For example:

DisData <- source_data("http://bit.ly/156oQ7a", cache = TRUE)

When the function is run again, the data set at http://bit.ly/156oQ7a will be loaded locally, rather than downloaded.

To delete the cached data set, simply run the function again with the argument clearCache = TRUE.

source_XlsxData

I recently added the source_XlsxData function to download Excel data sets directly into R. This function works very similarly to the other source functions. There are two differences:

  • You need to specify the sheet argument. This is either the name of one specific sheet in the downloaded Excel workbook or its number (e.g. the first sheet in the workbook would be sheet = 1).

  • You can pass other arguments to the read.xlsx function from the xlsx package.

Here’s a simple example:

RRurl <- 'http://www.carmenreinhart.com/user_uploads/data/22_data.xls'

RRData <- source_XlsxData(url = RRurl, sheet = 2, startRow = 5)

startRow = 5 basically drops the first 4 rows of the sheet.

9 May 2014

d3Network Plays Nice with Shiny Web Apps

After some delay (and because of helpful prompting by Giles Heywood and code contributions by John Harrison) d3Network now plays nicely with Shiny web apps. This means you can fully integrate R/D3.js network graphs into your web apps.

Here is what one simple example looks like:

An explanation of the code is here and you can download the app and play with it using:

shiny::runGitHub('d3ShinyExample', 'christophergandrud')

European Parliament Candidates Have a Unique Opportunity to Advocate for Banking Union Transparency and Accountability

This is reposted from the original on the Hertie School of Governance European Elections blog.

The discussion of issues around the European Parliament Elections has been beating around the bush for quite some time now. Karlheinz Reif and Hermann Schmitt famously described European Elections as ”second-order elections”, in that they are secondary to national elections. A few weeks ago on this blog Andrea Römmele and Yann Lorenz argued that the current election cycle has been characterised by personality politics between candidates vying for the Commission presidency, rather than substantive issues.

However, the election campaigns could be an important opportunity for the public to express their views on and even learn more about one of the defining changes to the European Union since the introduction of the Euro: the European Banking Union.

Much of the framework for the Banking Union has been established in the past year after intense debate between the EU institutions. A key component of the Union is that in November 2014, the European Central Bank (ECB) will become the primary regulator for about 130 of the Euro area’s largest banks and will have the power to become the main supervisor of any other bank, should it deem this necessary to ensure ”high standards”.

A perennial complaint made against the EU is that it lacks transparency and accountability. While there are many causes of this (not least of which is poor media coverage of EU policy-making), the ECB’s activities in the Banking Union certainly are less than transparent according to the rules currently set out. As Prof. Mark Hallerberg and I document in a recent Bruegel Policy Note, financial regulatory transparency in Europe and especially the Banking Union is very lacking. Unlike in another large banking union –the United States, where detailed supervisory data is released every quarter – the ECB does not plan to regularly release any data on the individual banks it supervises.

This makes it difficult for citizens, especially informed watchdog groups, to independently evaluate the ECB’s supervisory effectiveness before it is too late, i.e. before there is another crisis.

The European Parliament has been somewhat successful in improving the transparency and accountably (paywall) of the ECB’s future supervisory activities. Unlike originally proposed, the Parliament now has the power to scrutinise the ECB’s supervisory activities. It will nonetheless be constrained by strict confidentiality rules in its ability to freely access information and publish the information it does find.

In our paper, we also show how a lack of supervisory transparency is not exclusive to EU supervisors – the member state regulators, who will still directly oversee most banks, are in general similarly opaque. We found that only 11 (five in the Eurozone) out of 28 member states regularly release any supervisory data. Member state reporting of basic aggregate supervisory data to the European Banking Authority is also very inconsistent.

European Parliamentarians could use the increased attention that they receive during the election period to improve public awareness of the important role they have played in improving the transparency and accountability of new EU institutions. Perhaps, after the election, they could even use popular support that they may build for these activities during the election period to get stronger oversight capabilities and improve financial supervisory transparency in the European Banking Union.

30 March 2014

Numbering Subway Exits

In a bit of an aside from what I usually work on, I've put together a small website with a simple purpose: advocating for subway station exits to be numbered. These are really handy for finding your way around and are common in East Asia. But I've never seen them in Western countries.

If you're interested check out the site:

23 February 2014

Programmatically download political science data with the psData package

A lot of progress has been made on improving political scientists’ ability to access data ‘programmatically’, e.g. data can be downloaded with source code R. Packages such as WDI for World Bank Development Indicator and dvn for many data sets stored on the Dataverse Network make it much easier for political scientists to use this data as part of a highly integrated and reproducible workflow.

There are nonetheless still many commonly used political science data sets that aren’t easily accessible to researchers. Recently, I’ve been using the Database of Political Institutions (DPI), Polity IV democracy indicators, and Reinhart and Rogoff’s (2010) financial crisis occurrence data. All three of these data sets are freely available for download online. However, getting them, cleaning them up, and merging them together is kind of a pain. This is especially true for the Reinhart and Rogoff data, which is in 4 Excel files with over 70 individual sheets, one for each country’s data.

Also, I’ve been using variables that are combinations and/or transformations of indicators in regularly updated data sets, but which themselves aren’t regularly updated. In particular, Bueno de Mesquita et al. (2003) devised two variables that they called the ‘winset’ and the ‘selectorate’. These are basically specific combinations of data in DPI and Polity IV. However, the winset and selectorate variables haven’t been updated alongside the yearly updates of DPI and Polity IV.

There are two big problems here:

  1. A lot of time is wasted by political scientists (and their RAs) downloading, cleaning, and transforming these data sets for their own research.

  2. There are many opportunities while doing this work to introduce errors. Imagine the errors that might be introduced and go unnoticed if a copy-and-paste approach is used to merge the 70 Reinhart and Rogoff Excel sheets.

As a solution, I’ve been working on a new R package called psData. This package includes functions that automate the gathering, cleaning, and creation of common political science data and variables. So far (February 2014) it gathers DPI, Polity IV, and Reinhart and Rogoff data, as well as creates winset and selectorate variables. Hopefully the package will save political scientists a lot of time and reduce the number of data management errors.

There certainly could be errors in the way psData gathers data. However, once spotted the errors could be easily reported on the package’s Issues Page. Once fixed, the correction will be spread to all users via a package update.

Types of functions

There are two basic types of functions in psData: Getters and Variable Builders. Getter functions automate the gathering and cleaning of particular data sets so that they can easily be merged with other data. They do not transform the underlying data. Variable Builders use Getters to gather data and then transform it into new variables suggested by the political science literature.

Examples

To download only the polity2 variable from Polity IV:

# Load package
library(psData)

# Download polity2 variable
PolityData <- PolityGet(vars = "polity2")

# Show data
head(PolityData)


##   iso2c     country year polity2
## 1    AF Afghanistan 1800      -6
## 2    AF Afghanistan 1801      -6
## 3    AF Afghanistan 1802      -6
## 4    AF Afghanistan 1803      -6
## 5    AF Afghanistan 1804      -6
## 6    AF Afghanistan 1805      -6

Note that the iso2c variable refers to the ISO two letter country code country ID. This standardised country identifier could be used to easily merge the Polity IV data with another data set. Another country ID type can be selected with the OutCountryID argument. See the package documentation for details.

To create winset (W) and selectorate (ModS) data use the following code:

WinData <- WinsetCreator()

head(WinData)


##    iso2c     country year    W ModS
## 1     AF Afghanistan 1975 0.25    0
## 2     AF Afghanistan 1976 0.25    0
## 3     AF Afghanistan 1977 0.25    0
## 15    AF Afghanistan 1989 0.50    0
## 16    AF Afghanistan 1990 0.50    0
## 17    AF Afghanistan 1991 0.50    0

Install

psData should be on CRAN soon, but while it is in the development stage you can install it with the devtools package:

devtools::install_github('psData', 'christophergandrud')

Suggestions

Please feel free to suggest other data set downloading and variable creating functions. To do this just leave a note on the package’s Issues page or make a pull request with a new function added.