Semantic Fabric Theory - Nate's Notes

# Fabric Theory - Draft/Outline ### Why even bother? A good paragraph overview is found in [fabric and the future](Abstractions%20at%20Unsupervised.md#4%20Fabric%20and%20the%20Future). Essentially, we need to recall that pattern find is **weighted graph search**. Data scientists historically have manually built out that weighted graph so that pattern find could search it. They did this by combining: * An understanding of what features could legally be created * An understanding of what features the customer wants created * An understanding of what features the customer should want created This is a tremendous amount of work. This takes a massive amount of time. Fabric is an attempt to intelligently automate/allow user input to configure this. ### Tasks * Everything in the fabric world is a graph. The nodes of the graph are **tasks**. The **language space** that we work within in fabric is *tasks*. It's the case that you can take SQL and directly *translate* to fabric graphs and back, etc. It's a language in the sense that logical plans are languages. ### What does fabric do? **Fabric composes two external services together** * It composes that database and the dask cluster together. * Generally we will want to take information (metadata) that is stored in the database and use it to do things in the cluster. However, it can also be the other way around. For instance, we could use the cluster to load in the data, in which case it will fetch information about the partitions, then create entries in the database to represent the load operation. **Fabric is a database management system** Fabric is a layer that sits atop a giant free form columnset. ### Encodings: What is the purpose? The encodings ontology (graph) has several purposes. 1. To support the fabric task library. Specifically, we want to be able to automate the answer to the question: "Given a specific **attribute**, what **tasks** are available to me for this attribute?". Put another way, if we are at node `x` in the graph, *without touching the data* how can we determine what other operations we can take on here? If for example `x` is `state name`, we know that we can not apply an average. The key here is that to infer this we only needed to look at the encoding name-we did not need to touch the data! The task library has each task annotated with valid encodings. This functions as a type system. Given a particular encoding and applying a specific task we can be sure of the input (due to knowing the encoding) and the output type (due to having a transformation rule from one encoding type to another). 2. Currently, encodings gives a very binary answer to whether or not you can do this task. For instance, given an attribute pair of latitude/longitude, their encoding type may tell us that we can transform to state. Currently encodings will simply provide a "yes" this transformation is allowed. However, that does not answer the question "should you make this transformation?" Semantics can come in here. ### Why is graph based better than relational? The **fabric task graph** is an abstraction of operations on data. Operations have an intrinsic ordering and strict dependencies; so, by defining a set of operations you have described a dag whether or not you intended to. So, the *fabric task graph* is a *metadata layer*. Note that the *data layer* has the *compute graph* (i.e. the dask graph). Keeping track of these DAG's allows you to compile and optimize them more readily. All programming languages can be expressed as a DAG. When you write something it is effectively procedural, so you have procedures go in order. Compiler can say that these two points of code can be executed at the same time. ### Attribute Space Attribute Space can be thought of as a projection from the task DAG to some other space. It could be a metric space (embeddings), or it could be more in line with markov models. There are many ways to we can perform this projection, but at the end of the day we pick the space we wish to project into. ### Where do semantics fit in? Semantics are ways that people can annotate the task graph with subjective information. Encodings contain objective truth that everyone could agree on. Semantics contain subjective information. The vision is that semantics will be the interface through which people communicate with the AI (via natural language). We can think of semantics as **constraints** (pruning our attribute space or task graph), as well as **additive** (the user telling the AI that it *should* do something. ### Views vs Perspectives The operational definition of a view is a collection of attributes (tasks) that share the same perspective. In some sense we could represent a dataframe; we can't have a dataframe where two columns represent different indices. If we think of it this way then a perspective itself is a view. In a sense, a perspective is no longer a thing that you build. Rather, it is more aptly described as "self-organizing" (see [here](https://www.notion.so/unsupervised/Fabric-Deep-Dive-0b2fcf5ed81f4e2d9c947d41970e7e5f)) ### Perspective An encoding tuple uniquely defines a perspective. See: * [here](https://www.notion.so/unsupervised/Entities-Perspective-Encoding-Tuple-123ba10245b647949d6740a0e944fbae) * [here](https://unsuper.slack.com/archives/C0108L5DU4W/p1605629365127800?thread_ts=1605563491.125400&cid=C0108L5DU4W) * [here](https://www.notion.so/unsupervised/Perspective-ebd49c5ff922411b9b72701dcda97a7a) # Fabric Caching **Fabric attributes are fundamentally (partitioned/distributed) key-value lookups of data defined by the graph that produces them. Each attribute itself has a key, and can be stored in a larger, meta store known as fabric.** --- References: * [Semantic Fabric Theory Questions](Semantic%20Fabric%20Theory%20Questions.md) * [Conversation with JW](https://unsupervised.zoom.us/rec/play/7XMnYlB7f4mKRUtnNjBSy-_1Re3YaboAemDK1f9f_Y1KnElb3sb5rIJCQGajXc54nEP1yMS1FLd_6tMa.scJLlQjRSMUf2LQy?autoplay=true&startTime=1619028422000&_x_zm_rtaid=mxyypPKcQV6B9h3xpU6G4w.1619048895452.3eef56cda1f91ce5b163f539ad8f0f71&_x_zm_rhtaid=137)