DeSciNet: Crowd-sourced causal modelling at scale

Causal models are about explicating confounders - that is, identifying variables that influence both the cause and the effect
Having an explicit causal model at hand allows for mechanical reasoning about counterfactuals. This uses Judea Pearls Do-Calculus.
A Scala-based runtime is used to load observations into a model and sample predictions of actions and inaction.
Any Constellation Metagraphs are valid data sources. Causal models are uploaded and dynamically evaluated on the DeSciNet Metagraph.
Defines discrete observations over time compatible with all Metagraphs and allows for efficient sampling in a dynamically compiled Causal VM
DeSciNet implements a Causal Reasoning Runtime able to sample arbitrary user-defined models. Only possible on Constellation!

Inspiration

Personally I'm very interested in automatic causal reasoning to enable fully automatic decision making with solely declarative user objectives and explicit causal assumptions (as compared to imperative strategies and implicit causal assumptions).

I believe reasoning through action and effect is the underlying abstraction that both politics and money, as well as social coordination at large, are unified by. And with the Do-Calculus, reasoning through explicated causal assumptions is mechanizable! Further, Constellation Network provides the data validation and normalization means to verify arbitrary observations to feed into those Structured Causal Models.

The only thing missing here is a process that gradually triangulates a representative model of the real world. And whatever the surprise-minimized consensus model currently is: it then allows us to reason about hypothetical actions and how they will change reality.

DeSciNet is such a crowd-sourced epistemic model discovery process!

What it does

Users upload Structured Causal Models that predict state changes of other Metagraphs. All models are ranked, creating a global competition to understand our physical world with increasing precision.

How we built it

It all builds on the mathematical foundations of structured causal models (SCMs). Broadly speaking, they are defined as having unmodelled background variables and modelled endogenous variables.

Background variables are substituted for L0-queries of arbitrary Metagraphs. They are declared by users. Currently the data is uploaded manually with the exogenous variable proposer being the only authority to append more observations. This should be signed by the respective source Metagraphs in future releases.

Endogenous variables are explicated by equations which makes the causal model. Such causal models are uploaded by users and look like this:

{
    "externalVariables": {
        "Measured_longitude": "1fe755bb73679a35cff80642e5575336d4dd71ab92e08065d7f5729604d4390c",
        "Measured_latitude": "ab88af652530ea855f75b2af5c06a3e5c34bb0f96d0cbfa36f2362a29d9c8a05"
    },
    "internalVariables": {
        "Human_longitude": "latest(Measured_longitude, t) + randomGaussian() * epsilon * sqrt(t - latestTime(Measured_longitude, t))",
        "Human_latitude": "latest(Measured_latitude, t) + randomGaussian() * epsilon * sqrt(t - latestTime(Measured_latitude, t))",
        "epsilon": "1.0"
    }
}

Each model references a time series of observational data as the L0 hash of such variables in the DeSciNet Metagraph. In this case, a Google Timeline location history. Individual measurements of these external variables are integrated into the models internal variables.

This simple model assumes that human movement follows a normal distribution centered around the latest measured value with a standard deviation that increases with the square root of the time difference. This means that short travels are more likely, and far travels in a short amount of time are less likely, but as time increases, larger changes become more probable.

All models must conform to a time-based dynamic model structure which incorporates Metagraphs as data sources and has discrete time. More on the mathematical background can be read in the DeSciNet Framework.

This allows for a general purpose algorithm to quantity the accuracy (surprise) of any given model on a set of observations. Such a domain-invariant ranking system allows for a single competition that models the underlying physical processes for all data we have available.

So in this case, any model that incorporates an exceptional high probability for jumping from airport to airport would be ranked better on GPS timelines which boarded a flight on the way. And any model incorporating train schedules and car travels along street layouts would score even better. And so on until reality is accurately modeled.

Challenges we ran into

Initially I thought I can just abstractly refer to any arbitrary Metagraph via its address, consider its L0 state, and have the user specify a Scala-expressed function that traverses the state and reduces it to a single value with a generic type 'T. That would be the ultimate abstraction of observations while still having a proof that the produced data conforms to that Metagraphs consensus state.

However, I found out that its not that simple. First of all, most data is only indirectly referenced through hashes. Second, even if one can resolve the binary data behind those hashes, some data schema needs to be used for interpretation as structured types one can compute over.

So to compute a query of another Metagraphs state, somebody needs to run a node for that Metagraph whichs code has the correct data schema implemented, usually through a Scala type hierarchy that is fitted to the underlying data. Then my DeSciNet Metagraph would need to query those nodes via HTTP and import the reduced result into its own Metagraph state, implicitly trusting the source Metagraph node to return the correct information.

Accomplishments that we're proud of

I'm most proud of finding the formulas that capture the abstract natures of both the Hypergraph and Structured Causal Models:

It does not make any assumption about the type of data used for observations while adhering to a data format that all Metagraphs implicitly implement.

At the same time, all causal models defined in that form are inherently compatible with the Do-Calculus, so such models have utility in automated decision making as their form allows for exact computation of counterfactual probability distributions. That is, make predictions about the effects of hypothetical actions with no more assumptions other than the model itself. So if we have an accurate model, we can accurately reason through counterfactual situations.

I'm looking to use this kind of automatic reasoning for algorithmic trading and to make our global economy more interactive and proactive as we can address many more types of values than scalar currency units.

What we learned

First of all I learned how to fit time into Scams through the use of dynamic models.

Further I learned a lot about the structure of Metagraphs and how they run on the Hypergraph.

Also I found out that cross-metagraph data imports must become more streamlined for Constellation Network. I feel it is unnecessary IO-work and consensus-risk to import already validated state query results from other Metagraphs. Sure, we need to access the computed state for evaluation, but we at least should have a proof of the query result being sourced from the other Metagraphs consensus state under a given interpretation - and not just blindly trusting various L0 nodes.

What's next for DeSciNet: Crowd-sourced causal modelling at scale

The immediate next todos are:

Harden the Causal Reasoning Runtime for surprise-testing
Extend the causal reasoning runtime to allow the evaluation of the Do-calculus that will allow for abduction and transduction to predict through latent variables (= full counterfactual reasoning)
Implement a distributed sampling pipeline where any node can submit SCM evaluation samples and the network aggregates a models total surprise over a given set of observations.
Implement a DESCI token economy that allows people to stake on certain sets of variables, as an economically relevant indicator of what outcomes should be surprise-minimized by Causal Model Engineers, with the rewards paid to consensus-model authors being proportional to that.
Implement a trustless way to import data from other Metagraphs as measurements into the DeSciNet Metagraph by encoding a universally reproducible data schema in the External Variable definitions (CQL may help here for categorical data migration, the Categorical Query Language). Utilizing CQL for data schema definitions further allows for more complex observation types than simply Double.
Implement model-based reinforcement learning on top of SCMs to automate trajectory mining and assist in decision making
Instead of allowing arbitrary string labels for endogenous variables within causal models, enforce variables to be expressed in IEML, the Information Economy MetaLangauge. This makes all variables have a defined address on the semantic sphere, which is a topology of all possible concepts based on non-commutative geometry. All concepts are derived from six semantic primitives (virtual, actual, sign, being, thing, emptiness) and a regular grammar (substance x attribute x mode). This interlinks all variable names in the semantic realm and disambiguates the same variable across all spoken human languages. That way there is no divergence anymore between e.g. a Chinese model and a German model that are, causally speaking, the same.
Finally I'm looking to build a non-monetary economy with ask.network as the more long-term vision. It uses the global surprise-minimized consensus SCM to guide and settle negotiations of action and effect. It effectively coordinates who does what so that we shape reality how we like it, with everybody specifying their own value set and the network generates trajectories for each user which would lead to the most expected realizable value.

Built With

euclid
scala
scm

Updates

Bruno Zell started this project — Sep 04, 2024 11:32 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.