Skip to main content

Reflections: Semantic Representations for Decision-Making (In Financial Markets)

Over the last several months I have been working on an exciting project to improve connections between data and knowledge resources related to climate change and food insecurity. The goal of the project has been to classify and create ontologies of applied agriculture research and open linked data. While necessary, not incredibly cutting edge. However, my approach was different and if successful  has broad application in the information retrieval and the semantic web. My proposal was based on a straightforward idea, explore machine learning to infer semantic relationships between open linked data and knowledge resources published by U.S. university extension systems.
While the idea is certainly straightforward, it has been anything but simple. Nevertheless I have persisted, hacking away, obsessively at this project almost for the past 8 months.

During this time, I began working on another idea related to a concept I call, 'resilience based investment.' Very simply, the goal of this investment strategy is to go long on investments that support regional and global resilience, while shorting those investments that perpetuate regional volatility due to risk exposure and system-level connectivity. It is an attempt to incentivize a redirection of capital into the resilience and risk management/mitigation/adaption economy to accelerate a widespread shift in the character of global markets (more to come).

Conceptually, the tasks of creating a tool to learn (ad-hoc) ontologies from text corpora and developing strategies for resilience investing might appear disjoint and unrelated. However, these two tasks converge when it comes being able to monitor the risk of disruption. For instance, we know climate change introduces significant challenges to sustained food security (IPCC 2014).

Intergovernmental Panel on Climate Change. (2014). Climate Change 2014–Impacts, Adaptation and Vulnerability: Regional Aspects. Cambridge University Press.

Due to sea level rise, some communities will be faced with reduced availability of arable land requiring communities to decide between human settlements and food production. At the same time, climate change is expected to open up ecosystems to new disturbances from plant pathogens and fire. Couple this with rates of natural resource depletion in raw materials such as precious metals, clean water, and other building materials, and one quickly begins to realize the interconnected character of risk in the modern economy. Indeed, everyone of these materials is connected to some value, companies are connected to these materials as they serve as one of the primary enablers of corporate value (i.e. the Factors of Production). These impacts present significant risks to the function and operation of nearly every facet of the global financial system.. 

Now consider the original task: classify and generate ontologies of applied research and open linked data. The ability to classify streams of text and data resources to enable communities to collectively solve localized climate change challenges has broad application, not just for improving community search, but also in event monitoring and decision-making. In particular, the ability to monitor and classify events in real-time, specifically events that have direct impact on the function of markets can be of particular value to firms looking to hedge risk in the commodities and futures markets, but also equity markets where firms are tightly coupled to price and reliable availability of raw materials and capital. 
Upon recognizing this connection, I decided to ask,  "Can I create a method that not only infers nested topics within a given text, but also makes inferences about the relationship of those topic sets for some point in time with the price dynamics of regional markets/indexes?" 
The short answer is, 'Yes, I think so."

However, I don't think it is possible using my original approach for building ontologies. Originally, I set out to explore the utility of supervised learning methods from pre-existing training data.



This included conducting workshops to gather 'expert' input in classifying documents and data sets. What I found was that this approach takes a lot of time, and once you have defined a formal structure for describing relations between digital objects, it is not so clear how to connect new documents outside of your original corpus, especially if that corpus is outside your original domain. Moreover, different digital assets often possess different structures, and it is not often clear what elements are important.

So how do we deal with data streams that have different structures, or cover multiple domains and sources?

Hierarchical Temporal Memory (HTM) systems and sparse distributed representations (SDRs) have produced promising results in anomaly detection, and in dealing with high-dimensional and noisy data (Webber 2015). In particular, semantic folding theory seems to provide a basic starting point for exploring alternative semantic representations. In semantic folding and SDRs, semantic representation space is defined as spatially explicit terms, and spatial coordinates are given a value of 0 or 1. Now, semantic folding and SDRs are interesting in their own right,  and I am currently doing a lot of work in this area, but what really struck me about SDRs and semantic folding was the relationship to current research in topology.

I had read some older research studies that employed algebraic topology,  and more specifically, 'polyhedral dynamics.' This prompted another question, could polyhedral dynamics (Atkin and Casti 1977) provide some insight into connecting multi-domain and structurally heterogeneous data types.
 Polyhedral dynamics or Q-analysis  was introduced as a methodological framework (Gould 1980) that could be used to explore an interesting class of phenomena.  In particular, q-analysis boasts a framework for encoding data as well-defined open sets, that are related to other well-defined open sets through some relational mapping. The approach provides a way for defining multi-dimensional, and data relations as a set of simplicial complexes. Focusing on the simplicial complex of a data set at different 'scales' of analysis opens up a view of the structure of data and the flow of relations through the data set itself. This provides, a potential coding scheme for connecting multi-domain data in an abstract way while also maintaining much of the original data.


In the next few posts, I will go into more detail about Q-Analysis, why I think it is useful to linking news traffic data and price dynamics in the market.









Comments

Popular posts from this blog

Notes on defining a language model

Wikipedia defines "Language Model" as " a  probability distribution  over sequences of words.  Given such a sequence, say of length  m , it assigns a probability   to the whole sequence."    The Stanford NLP Group similarly implies this definition through the description of the language modeling in the context of Information Retrieval .  The equation above refers to the chain rule defined by:  See chain-rule definition in the  NLP Review of Basic Probability Theory .  Generating a probability distribution is one part of building a usable language processing infrastructure. A  useful statistical language model typically depends on the specific need, or problem you want to solve, and of course the domain of your problem. Thus the ability to cluster and partition sequences of words based on their likely occurrence given a query as input can serve as the starting point for connecting probability distri...

Q-Analysis of Natural Language

Q-Analysis is a methodological perspective and language that can be applied to study system structure, and its dynamics. Indeed, q-analysis has been dubbed the “language of structure” ( Legrand 2002 ), because it provides both a mathematical framework and particular vocabulary for defining system features and relationships ( Atkin & Casti, 1977 ; Gould 1980 ). The mathematical framework of q-analysis is built on algebraic topology , a branch of abstract mathematics that is interested in space and shape under continuous deformation (e.g. the bending, compressing, stretching of shapes). In topology, and specifically q-analysis, shape is defined by the relationships between elements in open sets. The relationship between these sets produce new sets representing edges, faces and simplicial complexes that form as a result of the relational mapping λ     from some set A and some set B to a new set C.   The relation  λ represents a rule for defining the condit...

Defining "lenses" for a Q-Language Topology

I n my previous post,  Q-Analysis of Natural Language  I started to describe a path for applying q-analysis in the study of natural language. One of the particularly interesting aspects of q-analysis is the ability to connect hierarchical data in a rather straightforward (although non-trivial) manner. The process of connecting data are described through the definition of a relational mapping and the rules defined for that mapping.  The relational mappings result in a new subset consisting of the combinations of the two input sets. The resulting new combinatorial set serves as a cover for constructing q-connected simplicies. Thus allowing for inspection of the q-connectivity of sets across hierarchical scales. The below example described in Beaumont and Gatrell , shows the mappings between elements at different hierarchical levels of N. The structure is the resulting mapping between three interrelated sets defined by the relation. In the language of q-analysis, ...