Prototyping open econ tools – COVEREDINBEES / A hive of TLDR & villainy

OUTLINE AIMS

Just ramble some stuff because this is a draft post! For God’s sake!

And write what I want out of it:

Link to some organisations to collab with. Define what that collab will look like, get feedback on this plan early.

Over the past couple of years, I’ve been building up a loose ‘regional economic data’ toolkit while trying to answer questions posed in my Y-PERN work (mainly with SYMCA). Some of it’s gathered on the regionaleconomictools website, the rest in various places on github. The live outputs page on the techie blog has gathered most of it in one place.

I want to have a go at making these tools better in collaboration with others, adding to the toolbox as we learn.

But I am hoping there are ways to dig deeper.

This is perfect ‘blind men and elephant’ parable territory (example applied to economics). Let’s apply it to regional analysis.

At the top, we’ve got “What does the economy look like” questions. We want direct answers - how is the economy structured, what’s growing, shrinking, why do we think this is happening, do we have any levers? We want to see as much of the elephant as we can.
Those lead straight to “What data/methods do we have to answer them?” Any one data source may help us grab one small part of the pachiderm. We try using different sources to get a grip on different bits, but the whole beast will likely still elude us.
Then there are awkward people saying things like, “Are we sure this is an elephant? I think maybe we have to go back to first principles here.”

The status of those last question types is tricky. As SYMCA’s Alice Rubbra eloquently explains here on the Y-PERN blog, it’s a clash of wotsits

But this is one of the things I want to experiment with. It’s a good time to try

because of the networks because of being able to create space for it Start with some humility - the world doesn’t have a shortage of people who’ll be willing to tell you they know The Answers.

Black box vs network [can mention some of the hollowing-out issues here that have no quick fix cf. Bradford community data]

What I think we have a chance to do collectively is a bit different - to create a shared sense of ground truth as we work together, built on the best understanding we can create between us.

A lot of that groundwork has been done already through regional network building. We have a decent chance of getting a handle on a good two-thirds to three-quarters of the elephant.

Why I think that comes under heading of “toolkit”. Overused but effective word, and can include the ways we can dig into the theory…

Prototyping what this niche is What specific qualities these kinds of networks can bring and why they flow naturally from / are a good match for the direction of devolution

And why openness is key.

That last one covers both normative stuff and ‘theory of data / theory of change’ stuff.

We need enough knowledge to know our tools’ limitations. Muuuch easier said than done. Krugman: “We just don’t see what we can’t formalise”. Continued privileging of quant over lived experience / ground knowledge (see below)

(Also, these last two points are rarely so cleanly separated - back to that in a moment.)

Is toolkit useful word? Rabbit hole!

This is the level one might imagine a toolkit lurks.

Issues around avoiding data-driven questions, streetlight effect. Issue of developing our own internal model - not just infographic cf. x-ray What are we missing? What’s wrong with our data and methods?

The limitations here include: “you need to leave the house and talk to people.” This is the quant data / reality gap…

Then, somewhere in here: what do we want? That’s the normative stuff. Try to unpick examples here. E.g. assumptions around how to make everyone more wealthy are key there, right?

There are a few reasons for this.

Regional policymakers across the UK face many of the same economic questions, including identical asks from central government about their growth plans. The same data sources are re-used, the same assumptions. There’s a lot of wheel-reinvention.

But -

To mix metaphors horribly -

At the same time, we’re all working to develop our own identities as devolution deepens.

The only way to do this is to prototype - the key to test and learn. As that doc says:

At its simplest [it] involves (1) developing something; (2) making contact with reality; (3) learning from the results.

THE BELOW MIGHT BE A SEPARATE ARTICLE.

Things to mention:

Remain tool agnostic, code, no-code. I’m an R person, others won’t be, and there are up and downsides to going that route. Thinking agnostically, I can still provide links to processed data sources that others can use, and provide easy-ish-to-run examples…

Do data work openly where you can. It will support collaboration and learning. It will help build a shared sense of ground truth. It will avoid wheel reinvention.

I don’t have a dogmatic fealty to open source — I believe in a data/analysis mixed economy where private providers can do things other can’t and should be paid for that. Working in the open is just better for much of the analysis we need for regional policy success, because it creates space for those three things in the post’s subtitle. Let’s go through those again.

Collaborative learning (or: bootstrapping our capacity building together)
Shared-ground-truth-building
Non-wheel-duplication (a facet of reproducibility, which needs open methods; you can’t reproduce something if you can’t access it).

I’ve already seen the value of openness for my own process of getting from numbers to things-in-front-of-policymakers [frame better?]. But how to deepen the collaboration part?

The ‘shared ground truth’ thing has a couple of elements. I’ve heard more than once about data ‘black boxes’ that can result from some commissioned work. You get an analysis, but no access to the data and no way to really know how conclusions have been reached. It’s a one-time hit - once the commission’s done, you decide to either take the results on face value or not, and then have to re-commission if more is needed later. [ties to point above about this not being a purely client rel - about that ‘shared truth’ thing between us where data is one facet… oh I do say this below!]

Again — I’m not saying that can’t be valuable and necessary, as organisations can supply expert analyses that often can’t be done in-house. I also carry out that kind of interpretation.

But open methods are more aligned with what I think is the reality of data-driven policymaking: everyone involved are experts in what they do. This is a deeply collaborative endeavour. We don’t all have the same knowledge, but we also won’t progress much without bringing everyone’s together. Contrast to the deficit model, which imagines a “transfer of information from experts to non-experts”. (This is a fairly common view of e.g. the university / policy relationship.)

Sometimes there’s straightforward transmission - if I’m teaching R coding methods to people who are new to it, say. But it’s never just that. I’ve seen this again and again in different settings - here, we bought some data skills but had little insight into what that Sheffield homelessness charity’s data actually meant. They made it make sense.

In regional policy, it’s been a little different but with the same outcome.

Two things, the above then also capacity building / why open methods = learning/collab = capacity building (I think)

You probably wouldn’t want collaborative heart surgery for example

Because it’s open, anyone can contribute, copy, re-use etc

But… so what? Realistic discussion about where that leaves us, regionally, trying to get through to shared things. What are the barriers? Where have my preconceptions been challenged? Why am I still holding on to this idea? What needs prototyping next?

[I can mention it’s now being re-used, which is what I was aiming for, but so far just by me…]

Build on open data, make the tools, build the capacity, link people together, do it questioningly, allow ways to get to the root and feed that back into the policy cycle, “walk asking”.

There are reasons why data work can’t be done openly (datasets that contain private information) or won’t be done openly (data/methods are IP a firm earns its living from; researchers protecting a lovingly curated dataset). But in regional policy, and with the vast body of data the ONS among others hold, there’s plenty of scope for it.

This is in GVA post and should connect:

And then something on: why the orig data sources are not enough. Just supplying openly isn’t enough. Without capacity to see what’s in that data, you’re still blind. What exactly does that capacity look like and how do we build it?

Using the same pipeline, and all in code, we can build reports, maps, dashboards etc. and share them easily. Once we have that, we can puzzle our way to what it means for our regions, what theories of change we believe it supports, what blindspots and problems the data has — but a data platform like this underpinning it smooths the path considerably.

More recently, I’ve been re-using these tools to help other Yorkshire government bodies get insights quickly. This came about quite serendipitously, but it’s helped convince me it’s a good way to work. [Explain something about how much it speeds things up / how cost effective it is]

(This is all on my github where I’m trying to corral more of it into the regionalecontools site, itself a github repo; the website’s all generated using R.)

I’m not suggesting how I’ve approached this is anything like the best way - but I am arguing that building these pipelines openly leads to good things that otherwise couldn’t happen. What I’d love to see is what others can do, and what kind of tools we could build together.

I will now proceed to go on about this at length, starting with a mildly trite statement that I’ll then unpick.