Showing posts with label data mining. Show all posts
Showing posts with label data mining. Show all posts

Tuesday, February 22, 2011

A Hypothesis of Brain Learning Based on Scale-Free Neural Networks

A simple neural net model I wrote sometime back
I am contemplating a new framework that tries to enhance the understanding of the learning process of human brains.

The basic functioning of the brain is based on Hawkins' memory prediction framework [see book "On Intelligence"]. However, I propose that the neural cells in the neocortex are connected as a scale-free network, rather than a static hierarchical structured outlined by Hawkins. Under this assumption, neural cells are not connected at birth through a "wiring diagram" found in our DNA. Rather, the connections are formed dynamically throughout our human lives, especially during infancy.

Information is stored in the brain as patterns of connections among neuron cells. Some neurons are more capable in attracting new links than others. With these two properties, growth and preferential attachment, I believe that neurons form a scale-free network, i.e. the distribution of links per node follow a power law [see book "Linked"]. Thus the neural network inherits all the fundamental properties of scale-free networks, such as robustness, and vulnerability. (Evidences of scale-free cortical network have been found recently by researchers.)

Remembering

Memory is stored as relations of sequential pattens. We remember by making associations of pattens that appear together spacially and temporally. When a child first see a triangular object, she is aware that this is an actual object because the pixels of the triangle (or rather pattens of neuron excitations) always move together in our retina each time our eyes samples the world. These pixels are associated with each other. At the same time, the triangle is also associated with the background, the present time, its color, the sound it makes, the lighting, the people around, and everything else that are happening around her at that moment. All these associations become connections between the neurons representing these excitation patterns.

The association mechanism is based on Hebbian learning, which can be simplistically summarized as cells that fire together, wire together.

Learning

Going back to the triangle analogy. The child is aware of the triangular object, yet she has no idea what object it is, until someone tells her that this particular triangular-shaped object is called a triangle. That is when the neurons representing the triangle are associated with name "triangle", which is by itself a set of neurons representing the actually symbol of the word. These neurons are also connected to other neurons that represent the sounding of the word "triangle", and any concrete facts associated with it. Learning happens when any new association is made.

Hawkins argues that during learning, some cells "learn" to fire when lower-level memories learn a sequence of patterns. These cells are passed to higher-level as abstract "names" of the detailed pattens. However he did not layout how these cells are selected in the first place. I think the naming is a natural result of association rather than explicit cell selection.

This explains why some cells are able to attract more links than others. The cells that represent abstract shapes, forms and concepts have a much larger probability to be associated with other cells. Whenever something happens, whether an image appears, a melody of music rings, or a train of thought proceeds, the general conceptual cells will fire if these concrete events fall into their category. Whenever this firing occurs, these conceptual cells make new associations.

Classifying

According to many studies, the brain learns to clasify information into categories. As information is passed from lower to higher levers of the memory hierarchy, more and more details are filtered out, forming abstraction. I think that this classification is a natural result of the threshold logic built into each and every neuronal connection. Each synapse collects electronic signal from all connections along its extension. Whenever the collective strength of the electronic signal become larger than its built-in threshold, it fires. When it fires, it propagates the its own firing power to all connections along its synapses.

Connectionists

This is in some sense similar to the "connectionists" theory of brain learning. However there is a fundamental difference. Rather than dividing the brain into functional modules, or sub-networks, each responsible for one particular domain, such as name networks and image networks, I believe all neurons are able to connect to any other neuron within their physical reach. I believe that neurons make connections, or associations, dynamically according to the information provided by all sensory input.

To be continued, on Hyppocampus.

Thursday, February 17, 2011

Digging Into Google Public Data Explorer

Google public data explorer seems to be a very useful tool. I generated this graph below, showing which cities in the Santa Clara county of California has seen increase of Asian population, as sampled from the ethnicity of enrolled students.

No surprise in seeing Asian population increase in "good-school" districts like Cupertino and Fremont, but interesting enough, what is Orchard Elementary, which has shot from 20% to more than 40% and emerge as one of the heavily populated Asian communities in the past 5 years? Guessed right, that's the North Valley, San Jose area, which saw the appearance of a new Costco, Lowes, tons of new town-houses, and a new shopping center currently under construction.

It would be even more interesting to use the google API (Dataset Publishing Language) to upload my various junk data and see how they visualize. :)

Wednesday, August 19, 2009

Idea 42: Relationship Analyzer - How Close Are You And Your Friends


How many friends do you have? Do you talk to them, or rather Twit? You may not realize, that the person you talk the most, share the most, connect the most may not be the one who you think is.

Have an algorithm to analyze your daily communication streams, Tweets, SMS, phone calls, emails, and then calculates, for each of your contacts, their degree of closeness from you.

Why is this useful? For a number of reasons, at least.

To set your priorities. What you want to achieve versus what you are actually doing. Have you called your mother lately? Call your mother right away.

To know yourself better. What is your biggest dream and untapped potential? If you're still unclear, the daily rumbling of yours might give some hint.

To discover friendship, and well other relationships. Who do you reweet the most? Who do you share the most tweet words, tweet links? If you chose to share the meta-data extracted from your streams, that may help the matching algorithm to suggest you some interesting fellows. (like an earlier idea here)

There might as well be other reasons that this can be useful. If you have one, do let me know.

photo credit loungerie

Sunday, August 16, 2009

Idea 40: You Daily Dose Of Bible Verse, Based On Your Twitter Updates


The Bible is rich of spiritual wisdom that can help you get through the most challenging part of life. Popular iGoogle app delivers a random verse to your desktop daily, but how do you know which verse is just the right one for your day?

Here's an idea. At the end of each day, your Twitter updates are collected, analyzed, and based on the keywords in your tweets, the most relevant Bible verses are extracted and presented to you.

In fact, not just your Twitter updates, your emails, places you visit, people you talked with, anything related to your day, as long as you can record it, can be used for analysis in the calculation. Another way to produce information from our lives. (see earlier idea about personal metrics)

As the Bible says:
Ask and it will be given to you; seek and you will find; knock and the door will be opened to you. - Matthew 7:7


photo credit Andy

Tuesday, July 21, 2009

Idea #22 Personalized Recommendations Through Twitter Stream


If you use Twitter a lot, the words steadily coming out from your Tweets, aka, your Twitter stream, contains a lot of information about you. It tells about your interests, location, food you eat, events you go to, opinions you have, and more. These are extremely valuable data to make highly personalized recommendations if well analysed. Most importantly, they are free and publicly accessible.

In fact companies have started mining these data streams in hope of providing highly differentiated services. For instance, moonit.com uses your Twitter behaviors, Tweets, followers and followees, to suggest who you should date. Your Tweets already tells who you are, why should you fill out a lengthy phycological survey ?

I would like an algorithm to recommend me the following based on my Tweets :
  • Events happening around me that are interesting to me (LikeMe.net ? No, still based on survey questionnaires)
  • People with like minds who I should meet and partner with
  • Restaurants and Cafes with just my tastes (yelp.com ? not quite.)
  • Cool gadgets that I should check out
  • News and websites that I will like (Stumbleupon.com? well maybe.)
  • and much more

These are just some simple examples. The opportunity is endless, as long as a data mining algorithm can be built to discover the associations between keywords, and cluster keywords into categories, as talked about earlier, the power of association.

BTW, you should try out the extreamly cool way to visualize your Twitter stream, Portwiture.com, which expresses your Tweets with Flickr photos.

photo credit Konstantin Sutyagin.

Saturday, January 10, 2009

Idea #16 - Predicting trading pattern through data mining

Predicting the trading decision of an individual is extremely difficult, however predicting the decision of the entire universe of investor community might be slightly easier if we consider a model that takes into account of herd behavior, or information cascade, accounted for in idea #12.

Assume we have the buy/sell trading data of all individual investors and institutions for the past 20 years (well, it might be difficult to obtain, but let us assume we have access). We can very well study the decision making process of the investment society as a group, testing the accuracy or predicting power, of different mathematical models for herd behavior, under both normal trading environments, as well as extreme conditions such as in 1987, 2001 and 2008.

Probability theory tells us that the aggregate of a group of random variables with i.i.d (independent identical distribution) behaves like Gaussian, i.e. the likelihood of all individuals making the same decisions decreases exponentially. However, in a social environment, this is far from reality, i.e. an individual's decision can be heavily influenced by peers, and the likelihood of making an emulating or similar decision increases greatly as more people start to join the herd.

In other words, the probability of an event that 99% of a bond holder decide to dump it at the same time is astronomically small under Gaussian. However, it is a different story under an information cascade model. By how much? It will have to be tested by the trading history of the herd.

hc85pfga62

Sunday, December 14, 2008

Idea #13 - Information mining by association

The memory in our brain is associative. We associate things, events, people together, which we observe at the similar place, or time.

We can visualize a "triangle" as a three-sided polygon, not because we inferred it using logic, but because someone told us to associate the term and concept, at some point in our past. We know "1+1" equals "2", not because we calculated it in our head as computers do, but because we were told to associated the calculation and results together.

We tend to think that we understand the world as a set of rules and logic that governs its behavior, and we can use these set of rules to predict its future, but in fact, what we have is simply a set of observations associated in time and space. (This concept of understanding the world is worth further exploration, which I may revisit later.)

The point I want to make here is that association is powerful.

We can already make a lot of sense of the world by associating the huge amount of data from the web, through similar space, time, or other metrics. Data itself does not constitutes information, only when it is structured in a way that entails consequences of low probability. By association, the data, whether textual or graphical, is organized into a web of concepts, possible scale-free. Thus any term entering the web will trigger a sequence of other related terms, which are highly correlated, and hopefully far from random.

This is how we may mine information from data. Information is gold.