Note to self: after updating R with Homebrew remember to
cleanup old versions:
Otherwise I'm liable to get a segfault. (see also)
brew cleanup r
Just wanted to put up the script I've been using to create an Amazon EC2 Ubuntu instance for running RStudio, Git, and Jags. There isn't anything really new in here, but it it has been serving me well.
The script begins after the basic instance has been set up in the Amazon EC2 console (yhat has a nice post on how to do this, though some of their screenshots are a little old). Just SSH into the instance and get started.
Over the past few months I’ve added a few improvements to the repmis–miscellaneous functions for reproducible research–R package. I just want to briefly highlight two of them:
Caching downloaded data sets.
source_XlsxData for downloading data in Excel formatted files.
Both of these capabilities are in repmis version 0.2.9 and greater.
When working with data sourced directly from the internet, it can be time consuming (and make the data hoster angry) to repeatedly download the data. So, repmis’s
source functions (
source_XlsxData) can now cache a downloaded data set by setting the argument
cache = TRUE. For example:
DisData <- source_data("http://bit.ly/156oQ7a", cache = TRUE)
When the function is run again, the data set at http://bit.ly/156oQ7a will be loaded locally, rather than downloaded.
To delete the cached data set, simply run the function again with the argument
clearCache = TRUE.
I recently added the
source_XlsxData function to download Excel data sets directly into R. This function works very similarly to the other
source functions. There are two differences:
You need to specify the
sheet argument. This is either the name of one specific sheet in the downloaded Excel workbook or its number (e.g. the first sheet in the workbook would be
sheet = 1).
Here’s a simple example:
RRurl <- 'http://www.carmenreinhart.com/user_uploads/data/22_data.xls' RRData <- source_XlsxData(url = RRurl, sheet = 2, startRow = 5)
startRow = 5 basically drops the first 4 rows of the sheet.
After some delay (and because of helpful prompting by Giles Heywood and code contributions by John Harrison) d3Network now plays nicely with Shiny web apps. This means you can fully integrate R/D3.js network graphs into your web apps.
Here is what one simple example looks like:
An explanation of the code is here and you can download the app and play with it using:
This is reposted from the original on the Hertie School of Governance European Elections blog.
The discussion of issues around the European Parliament Elections has been beating around the bush for quite some time now. Karlheinz Reif and Hermann Schmitt famously described European Elections as ”second-order elections”, in that they are secondary to national elections. A few weeks ago on this blog Andrea Römmele and Yann Lorenz argued that the current election cycle has been characterised by personality politics between candidates vying for the Commission presidency, rather than substantive issues.
However, the election campaigns could be an important opportunity for the public to express their views on and even learn more about one of the defining changes to the European Union since the introduction of the Euro: the European Banking Union.
Much of the framework for the Banking Union has been established in the past year after intense debate between the EU institutions. A key component of the Union is that in November 2014, the European Central Bank (ECB) will become the primary regulator for about 130 of the Euro area’s largest banks and will have the power to become the main supervisor of any other bank, should it deem this necessary to ensure ”high standards”.
A perennial complaint made against the EU is that it lacks transparency and accountability. While there are many causes of this (not least of which is poor media coverage of EU policy-making), the ECB’s activities in the Banking Union certainly are less than transparent according to the rules currently set out. As Prof. Mark Hallerberg and I document in a recent Bruegel Policy Note, financial regulatory transparency in Europe and especially the Banking Union is very lacking. Unlike in another large banking union –the United States, where detailed supervisory data is released every quarter – the ECB does not plan to regularly release any data on the individual banks it supervises.
This makes it difficult for citizens, especially informed watchdog groups, to independently evaluate the ECB’s supervisory effectiveness before it is too late, i.e. before there is another crisis.
The European Parliament has been somewhat successful in improving the transparency and accountably (paywall) of the ECB’s future supervisory activities. Unlike originally proposed, the Parliament now has the power to scrutinise the ECB’s supervisory activities. It will nonetheless be constrained by strict confidentiality rules in its ability to freely access information and publish the information it does find.
In our paper, we also show how a lack of supervisory transparency is not exclusive to EU supervisors – the member state regulators, who will still directly oversee most banks, are in general similarly opaque. We found that only 11 (five in the Eurozone) out of 28 member states regularly release any supervisory data. Member state reporting of basic aggregate supervisory data to the European Banking Authority is also very inconsistent.
European Parliamentarians could use the increased attention that they receive during the election period to improve public awareness of the important role they have played in improving the transparency and accountability of new EU institutions. Perhaps, after the election, they could even use popular support that they may build for these activities during the election period to get stronger oversight capabilities and improve financial supervisory transparency in the European Banking Union.
In a bit of an aside from what I usually work on, I've put together a small website with a simple purpose: advocating for subway station exits to be numbered. These are really handy for finding your way around and are common in East Asia. But I've never seen them in Western countries.
If you're interested check out the site:
A lot of progress has been made on improving political scientists’ ability to access data ‘programmatically’, e.g. data can be downloaded with source code R. Packages such as WDI for World Bank Development Indicator and dvn for many data sets stored on the Dataverse Network make it much easier for political scientists to use this data as part of a highly integrated and reproducible workflow.
There are nonetheless still many commonly used political science data sets that aren’t easily accessible to researchers. Recently, I’ve been using the Database of Political Institutions (DPI), Polity IV democracy indicators, and Reinhart and Rogoff’s (2010) financial crisis occurrence data. All three of these data sets are freely available for download online. However, getting them, cleaning them up, and merging them together is kind of a pain. This is especially true for the Reinhart and Rogoff data, which is in 4 Excel files with over 70 individual sheets, one for each country’s data.
Also, I’ve been using variables that are combinations and/or transformations of indicators in regularly updated data sets, but which themselves aren’t regularly updated. In particular, Bueno de Mesquita et al. (2003) devised two variables that they called the ‘winset’ and the ‘selectorate’. These are basically specific combinations of data in DPI and Polity IV. However, the winset and selectorate variables haven’t been updated alongside the yearly updates of DPI and Polity IV.
There are two big problems here:
A lot of time is wasted by political scientists (and their RAs) downloading, cleaning, and transforming these data sets for their own research.
There are many opportunities while doing this work to introduce errors. Imagine the errors that might be introduced and go unnoticed if a copy-and-paste approach is used to merge the 70 Reinhart and Rogoff Excel sheets.
As a solution, I’ve been working on a new R package called psData. This package includes functions that automate the gathering, cleaning, and creation of common political science data and variables. So far (February 2014) it gathers DPI, Polity IV, and Reinhart and Rogoff data, as well as creates winset and selectorate variables. Hopefully the package will save political scientists a lot of time and reduce the number of data management errors.
There certainly could be errors in the way psData gathers data. However, once spotted the errors could be easily reported on the package’s Issues Page. Once fixed, the correction will be spread to all users via a package update.
There are two basic types of functions in psData: Getters and Variable Builders. Getter functions automate the gathering and cleaning of particular data sets so that they can easily be merged with other data. They do not transform the underlying data. Variable Builders use Getters to gather data and then transform it into new variables suggested by the political science literature.
To download only the polity2 variable from Polity IV:
# Load package library(psData) # Download polity2 variable PolityData <- PolityGet(vars = "polity2") # Show data head(PolityData) ## iso2c country year polity2 ## 1 AF Afghanistan 1800 -6 ## 2 AF Afghanistan 1801 -6 ## 3 AF Afghanistan 1802 -6 ## 4 AF Afghanistan 1803 -6 ## 5 AF Afghanistan 1804 -6 ## 6 AF Afghanistan 1805 -6
Note that the iso2c variable refers to the ISO two letter country code country ID. This standardised country identifier could be used to easily merge the Polity IV data with another data set. Another country ID type can be selected with the
OutCountryID argument. See the package documentation for details.
To create winset (W) and selectorate (ModS) data use the following code:
WinData <- WinsetCreator() head(WinData) ## iso2c country year W ModS ## 1 AF Afghanistan 1975 0.25 0 ## 2 AF Afghanistan 1976 0.25 0 ## 3 AF Afghanistan 1977 0.25 0 ## 15 AF Afghanistan 1989 0.50 0 ## 16 AF Afghanistan 1990 0.50 0 ## 17 AF Afghanistan 1991 0.50 0
Please feel free to suggest other data set downloading and variable creating functions. To do this just leave a note on the package’s Issues page or make a pull request with a new function added.