Confessions of a Biological Scientist Part IV: SimEureka

Posted on September 29, 2013 by @ThoughtInfected

Science has proceeded uninterrupted for hundreds of years now, through its progress we have emerged from ignorance and awakened to the reality of our Universe. But scientific advancement is now retarded by a fundamental problem, the scientists. In the near future something as important as science will no longer be left to imperfect and inefficient biological scientists, but will become be the realm of digital scientists. In this fourth installment of Confessions of a Biological Scientist, I will discuss whether computers can really provide the creativity necessary for scientific discovery – can we simulate the eureka moment?

Computers are much better than any human at crunching raw numbers, over the last couple of decades they have also come to excel at storing and recalling massive amounts of information with extremely high fidelity. Where computers still lag behind humans though is their ability to form robust models based on data. Thusfar, computers have proven to be poor pattern recognizers, but this problem is far from intractable and significant progress has been made in recent years.

If you have ever snapped a picture and a box magically appeared around the face of your subject, or used more advanced algorithms that sorts like faces from a batch of photographs uploaded to Google+ or Facebook, then you have had interaction with computer pattern recognition algorithms. Using artificial learning algorithms, computers are taking their first baby steps to recognizing patterns in a way that will allow them true recognition of the objects that they are seeing.

Computers might soon be able to differentiate a cup from a cat, but could they really leapfrog all the way to scientific discovery in the near future? Can computers really express the deep sort of creativity necessary to make scientific progress?

Well firstly they don’t need to, and secondly yes they can (maybe).

Computers don’t need express human-like creativity to make significant scientific progress. Scientists often try to bill themselves as exceptional geniuses, bent over messy desks covered with pages of data and obscure scientific reports, coming to realize unexpected connections and innovative hypotheses. While this view is not an entirely false representation, science can be done in the exact opposite manner (and often is).

As opposed to the rapidfire testing of exciting hypotheses, science can also be the carefully controlled collection of data following minimal perturbation of a complex system. This type of science requires no reall creativity, and would require pattern recognition algorithms no more complex than those that find faces in your facebook photos. When you change input A, what happens to output B through Z. This type of predictable, slow and step-wise science is already quite mechanical and is particularly susceptible to automation.

The prototypical example of this type of science is the work of Dr Ross D King, who developed the robotic scientist known as ADAM. With this robot, Dr. King automated the entire scientific process of developing and testing hypotheses for the role of yeast genes in metabolic pathways (summarized here). This work provides a proof-of-concept that boring, step-wise, science can already be done in an entirely automated manner.

While I describe the kind of science done by ADAM as boring, the implication of it are anything but. The fact is that the majority of science needs to be boring, careful testing of predictable hypotheses in order to disentangle the rules underpinning complex biological systems. Indeed, it might be only through the patience and consistency of a fully-automated scientist that we can understand how the complex interactions of the tens of thousands of proteins present in a given cell.

So if we were to hypothetically unleash robot scientists to perform this kind of simple science without any deep creative abilities, how much progress could we really make – How far does boring get us? In my view, boring science can get us (just about) everywhere we want to go.

To illustrate just how great the progress is that could be made, let me give an example of how a cancer diagnosis could go in an age of robotic science:

You come into the lab complaining of non-specific symptoms. A cross-check with your daily biodata, and advanced diagnostic testing shows a clear sign of early lung cancer. You are put on a standard cocktail of drugs to slow the progression of your disease while vital information is collected. With the aid of a robotically controlled surgical robot, a sample of your cancer cells are then quickly biopsied via nano-surgery and sent to a computerized analysis and treatment lab (CAT-Lab).

Within this lab your cells are analyzed directly for protein and DNA content and within days, CAT-Lab will provide information to begin improving your treatment.

Your cells will then expanded within the lab.Techniques are applied to maintain your cells as close to their original state as possible. These expanded cancer cells are then embedded into a standard matrix of lung cells, to simulate the in vivo environment of your lung. The robotic scientist then tests a huge array of cancer drug cocktails. Through stepwise experimentation, a customized cocktail is developed to maximize cancer cell killing, while minimizing damage to your healthy cells. You are immediately put on a customized combination of drugs that quickly and cleanly eliminates the cancer from your lungs.

While at this point your cancer has more than likely been completely taken care of, CAT-Lab has also performed an analysis of the likelihood of recurrence in your case. The analysis suggests that it would be economical to develop a customized “vaccine” rather than run a small risk of the cancer reoccurring. A clear signature of surface protein expression for your cancer cells has been identified. Targeting this combination of markers, a nano-bot swarm which will instantly kill any cell bearing this signature is developed and placed as a rice sized nodule in your shoulder. Should your cancer reoccur, these nanobots will rapidly eliminate the cancer cells before they can cause disease.

So you can see, scaling current generation scientific techniques to automation level could really lead us to miraculous ends. While the scenario I describe above might sound like science fiction, it is absolutely possible in an age of automated science and does not require any special science to be invented beyond what can already be done today. While the extent of science that will be done might be greater or less than what I have described depending on the efficiency and efficacy of such robotic laboratories, I absolutely foresee this type of analysis to begin in the next decade.

The side-benefit of this kind of approach to customized and automated medical analysis would be that it also provides reams of data for “publication”. Taking that data and figuring out what kind of major discoveries lurk inside of these huge data-sets requires more creativity than would provided by robotic scientists like ADAM, or the CAT-Lab. Boring science can get us to the future, but it won’t ever revolutionize the way we understand the world. So can computers ever recapitulate the creativity that is necessary for scientific leaps exemplified by the great theories like evolutionary theory or relativity?

Can we create a robot genius?

The answer to this question is much deeper and harder than the question of whether robots can do basic science. In my view, creativity is a natural extension of our every-day pattern recognition abilities combined with a bit of randomnes and taken to the logical extreme. What we recognize a genius is simply the recognition of a highly unexpected pattern in the world around us. When we can extrapolate some new information across an opaque gap between old and new knowledge, we call it genius.

So if you can accept my view that our pattern recognition abilities naturally lead to creativity, then it seems obvious that exponentially progressing computers which can already recognize faces will soon match and eventually surpass us on the creative front as well. So can a computer do science? The answer is unequivocally, yes, and this will be enough to get us to the future. But, can a computer be creative enough to be the next Einstein? Maybe – we will just have to wait and see for that one.

Confessions of a Biological Scientist – Part III: The Natural Digitization of Science

Posted on September 15, 2013 by @ThoughtInfected

Science has proceeded uninterrupted for hundreds of years now, through its progress we have emerged from ignorance and awakened to the reality of our Universe. But scientific advancement is now retarded by a fundamental problem, the scientists. In the near future something as important as science will no longer be left to imperfect and inefficient biological scientists, but will become be the realm of digital scientists. In this third of Confessions of a Biological Scientist, I will discuss the increasingly digital nature of modern science, and why I see a natural progression towards completely digital scientists.

How do you determine the truth of something? If I were to say something like “All dogs have hair”, what actions would you take to determine the truth of this statement?

Your first course of action with a simple statement such as this, is to devise a counter-example and test to see whether this exists. In the case of my statement that all dogs have hair, the existence of a hairless dog would be adequate to disprove my statement. So you must seek a “hairless dog”. Thus, your next step is going to be to consult a knowledge repository of some sort to see if you can find evidence for a hairless dog.

If it were 100, 50, or even 20 years ago you would have probably started with a book.You might have had a set of encyclopedias in your home, wherein you could turn to the index and look up the term dog. You would probably first look up dog, where it might described them as a “furry friend of man”. At this point you might conclude that dogs all have fur, but being particularly cunning you might also check the index for the term “hairless dog”. And what if your search were to turn up nothing? For the time being, within the given dataset you would have had to conclude that my statement was tentatively true.

Now, it being 2013, you would take a much different tack in addressing the my question. You would simply access a search engine and type in the term “Hairless dog” to see if any such thing exists. Google being as large a data set as we have access to at this point, you can trust that if you can’t find a hairless dog on google, then it probably doesn’t exist. But, you would of course quickly be redirected to the wikipedia page for “Hairless dogs” and you would see that there are actually several breeds listed as hairless.

So at this point you would conclude that my statement was false, and you might devise a counter-statement: Most, but not all dogs have hair.

You have just performed a scientific action. You have taken a hypothesis, devised a confuting example, tested this using a given dataset, and then revised the statement based on this new data. This example also shows just how efficient and fast digital technologies are at testing largest data set ever assembled (ie the internet) to test truth.

This move to digital, searchable knowledge has been a revolution for assessing truth, and has become central to the modern scientific method.

As a scientist, I test scientific statements in the exact same way all of the time. In fact I would never set out to actually do an experiment without first testing it in the literature. As an example, lets say I think that the interaction of two particular proteins might be causing a certain type of cells to die, I must first check google to see what other work has been done on this.

I would examine the peer-reviewed set of scientific knowledge (known generally as “the literature”), and see how much evidence there is to support my particular hypothesis. It is mostly based on this result (and my own proprietary unreleased data), that I will decide whether to proceed with the experiment. I will do a sort of cost-benefit analysis to determine the originality of the experiment versus the likelihood of it working.

In one situation I might find nothing to indicate that this interaction occurs, or that it has no connection to cell death (marked as A above). In this case, I will likely have to throw away the hypothesis. Unless I already have some strong evidence of my own to suggest that this effect is occurring, I would take the lack of any suggestion as to this effect in the literature as evidence that this is not occurring. Based on this, I would throw away the hypothesis and look in another direction. This experiment is extremely original, but it is very unlikely to work.

Aside: There is an additional reason not to perform such an experiment which has no supporting evidence in the literature. Even if the experiment did happen to work, the fact that there has never been anyone else to suggest this effect would mean that I would have to go to extra lengths to prove my hypothesis. Going out on a limb is only worth it for a particularly juicy hypothesis, risky experiments are only worth it if it will lead to a big finding and an important scientific impact.

Another possibility might be that I find that there already exists overwhelming evidence that these proteins interact and cause this effect (marked as B above). In this case, the experiment is very likely to work but it is totally unoriginal and not really of much value to advancing scientific knowledge. I would simply be repeating experiments for what is already known. Thus, in this situation I would again likely not perform the experiment, rather I would simply cite the interaction of these proteins as scientific fact and move on to a different direction.

Only if an experiment falls into the sweet spot between being completely unknown and completely known would I feel that the hypothesis is justified and I should move on to testing it. Scientists are constantly on the lookout for facts which are balanced between true and false, wherein we can inject a bit of data to tip them in either direction. While the image of scientists is one in a white lab coat with a beaker in hand, the reality is just as likely to be hunched over a keyboard, testing hypotheses using digital tools.

If we think of these types of searches as not just casual googling, but as digital experiments we quickly realize that more science is already more digital world than it is physical one. As this trend continues into the future and becomes increasingly automated, it is only a matter of time until we replace scientists altogether.

So if so much of my work is already a matter of examining digital literature to perform simple testing of hypotheses and cost-benefit analyses to determine what hypotheses might be most advantageous to follow, then what is preventing a computer from performing similar analyses?

The major hurdle that I see for computer systems face today is digesting scientific data which we have encoded in natural human language. As it stands, computers are only beginning to be able to understand what we humans are really saying, but examples like IBM’s Watson show how we are rapidly making strides in accomplishing these ends. As computers are starting to understand the meaning of natural language, I do not think it will be long before a computational strategy develops to try to identify the low hanging fruit of science.

Even as it stands today, we need to perhaps ask how much of this work is really being done by the scientist? If Google is using advanced search techniques to link ideas and thoughts together algorithmically, based on deep learning algorithms, and using a staggeringly vast data set the likes of which a single human could never comprehend, then how much of this is already part of modern science.

It is indisputable that computers are already helping us find the experiments which best hit that sweet spot between originality and likelihood of working. Just as with everything else, the scientific world is naturally becoming more and more digitized. Soon enough, instead of us asking the computers whether this or that experiment is a good ideas, the computers will actually start telling us which experiments to do do, after that it is only a matter of time before they cut out the error prone scientists all together.

Next week I will discuss how we scientists with our lossy natural language and inadequate annotation are holding back this natural digitalization of science, and how we can start to help to put ourselves out of a job. Watch this video and lets meet back here next week.

——–

Interested in writing for Thought Infection? I am looking for contributors and guest bloggers. If you have an interest in futurism and you would like to try your hand at writing for a growing blog, send me an email at thought.infected@gmail.com

Confessions of a Biological Scientist – Part II: The Organic Engine of Science

Posted on September 8, 2013 by @ThoughtInfected

Scientific advancement is the greatest endeavour that humans have ever undertaken, but this is not to say it is perfect. Science as a system was evolved, not designed, and thus it suffers all of the warts, inefficiencies and limitations of any organic system. In the near future something as important as science will no longer be left to imperfect and inefficient biological scientists, but will become be the realm of digital scientists. In this second part of Confessions of a Biological Scientist, I will discuss the imperfections of the scientific system as and why artificial intelligence may be necessary to overcome our collective limitations.

Despite what the current dogma of scientific boosterism might say about it, I do not believe that humans are born as scientists. Yes, we are naturally curious. Yes, we naturally ask questions and recognize patterns. But taking these raw abilities and turning them into scientific thought requires a set of tools that are not necessarily natural to the human mind. It is for this reason that we have developed elaborate systems of education, research and publication in order to propagate scientific thought.

The modern system of scientific education is a long and arduous process. To become a scientist, one must first have a broad base of knowledge from which to build towards scientific thought. A good understanding of math, physics, and chemistry is an absolute must for any scientist. After this, a student must immerse themselves in a specific field, and eventually an even more specific discipline, for many years before they are adequately educated in the technical and academic knowledge necessary to work in a given field. All told, it takes at least 25 years and hundreds of thousands to turn a single curious child into a PhD graduate.

While it does succeed in producing more new scientific grads every year, the scientific education system is far from efficient. In today’s world, the age at which a scientist can expect to get their first post as an independent academic researcher has been steadily increasing. As we pour an increasing amount of energy goes into training scientists, it would seem that the scientific education system as a whole is actually becoming less efficient over time. Is there any way this process will realistically be able to compete if an AI emerged with even a low level of scientific ability?

This process whereby a system can actually lose efficiency over time is a characteristic flaw of organic systems. If there is inadequate selective pressure to maintain or increase efficiency, then over time the system may tend towards inefficiency, accruing errors which are never eliminated. Just as the human population has accrued costly maladaptations over time such as poor eyesight, obesity, and other genetic diseases, the scienctific system also carries with it negative traits which are copied from generation to generation of scientists.

And educational inefficiency is not the only, or even the worst, of these maladaptations found within scientific system.

If we hold that the ideal of science is the quest for pure truth, then it should be the ideas that best fit the data which are held to be the best. Unfortunately, this is not the case. Human communication has evolved from story-telling, and science is no exception. Scientists are often more interested with what provides the most captivating story and resonates with the current scientific paradigms rather than what is the simplest truth or best-fit model.

This tendency of scientists to converge on popular scientific notions is worsened by the publication arm of the scientific machine. Peer-reviewed scientific journals are ranked according to their impact factor, which is a metric based on the average number of citations that a paper published would get. Just as in politics or business, the goal of the science game is to be popular.

Journals want to publish work that is up to scientific rigor, yes, but even more importantly than this, they want to publish work that will be popular. But what determines what science is going to be most cited? It is not necessarily what is the best for the advancement of science, but simply what is most interesting to the scientists. This applies pressure to scientists to try to make scientific reports more interesting to the scientific audiences, skewing scientific writing towards grandstanding. On some level, science is simply a form of highly controlled entertainment, serving a very specific audience a very specific product.

Whereas, science might be best served through pairing carefully observed data with simple conclusions and measured insights. All too often it is expected that scientific reports present striking new data with exciting conclusions and deep new insights. Rather than letting the data speak for itself, science must be packaged up neatly and sold one powerpoint slides and scientific manuscript at a time.

This bombastic style leaves little room for unanswered questions, encouraging scientists to avoid discussing the potential pitfalls of their research, glossing over holes rather than addressing them. No longer it the scientist expected to impartially report data, but we must be salesmen shilling our own observations. We are encouraged to be as lawyers, advocating for our own stories in a battle between ideas.

Through this process of selling and re-selling science we are perpetuating the false perception that our scientific data is perfect, our conclusions unquestionable, and our insights complete. Yes, science is making steps forwards, but it is in shuffling steps not leaps and bounds.

The funding system for science also further exacerbates the problems created by the scientific publication system, because it is in effect simply an extension of the scientific publication system. Grant applications are ranked by scientists on their merit, but merit is really a function of how well the ideas of the grant resonate with current scientific thinking, and how interesting the ideas are to the panel. The idea that a scientists could slave away for decades on a niche problem which may or may not be of interest to the scientific community at large seems a quaint idea of a lost generation.

In the end, we really lack of any objective measure for the value of a scientific idea. This means that it is the one with the best story who wins. Even with our shiny armour of raw skepticicsm, we are still just as vulnerable to a good story as the rest of humanity. In my mind this is the fundamental limit we face today, human science can only advance as quickly as scientific groupthink can haltingly step from one paradigm to the next (see Kuhn).

In the end, it may be that these negative traits are not just inefficiencies which can be cut away from the scientific system, but they may be an expression of our human imperfection. We are inefficient, political and error-prone beings, thus by extension our science, and our scientific systems are inescapably inefficient, political, and error-prone. Human science is an organic machine, complete with imperfections and limitations.

Perhaps the only way to overcome the current limits to scientific advancement will be to remove the humans from science altogether. With the advancement of artificial intelligence it may soon be possible to create a new type of science, wherein our collective advancement is not limited to the whims of human minds. Next week I will discuss the first baby steps of robot scientists and how I imagine these scientists will begin to replace us, and ultimately the entire scientific system over the next two decades.

——–

Confessions of a Biological Scientist – Part I: The Limits of Meat

Posted on September 1, 2013 by @ThoughtInfected

Science has proceeded uninterrupted for hundreds of years now, through its progress we have emerged from ignorance and awakened to the reality of our Universe. But scientific advancement is now retarded by a fundamental problem, the scientists. In the near future something as important as science will no longer be left to imperfect and inefficient biological scientists, but will become be the realm of digital scientists. In this first part of Confessions of a Biological Scientist, I will discuss the limitations of biology, and why science must soon leave us behind.

I am a scientist. I spend my days in a manner that is likely more or less consistent with how you might imagine that a scientist would; researching, devising, performing, and analysing experiments to test new hypothesis about how the world works.

Notwithstanding the current challenge of obtaining and maintaining funding for scientific projects, being a scientist is a pretty good gig. Science offers a chance to do a job where you can truly embrace your creative side. Scientists also get to travel the world to present data and meet fascinating people. Most importantly for me though, being a scientist lets me do something that I know is making a positive impact on the world.

What has been bugging me lately though, is the question of just how much value I am adding to this equation really. I am not calling into question the value of science itself, that argument is easily dispatched with a look at the magical world which science has revealed. What I am questioning is the value of scientists, or to be more specific, the value of human scientists.

Before delving into the meat of my argument, I must make the obvious disclaimer that currently there is no alternative to employing scientists. If we want to do science, it is absolutely a necessary evil that we must deal with the biological and social inefficiencies of humans in order to take steps towards scientific progress. We have no choice but to keep feeding these meat-scientists for now, but I see a juggernaut on the horizon, a new kind of scientist is coming and it will make me nothing more than a child in a sandbox.

The profound deficiencies of human science are actually much deeper than they might at first seem; the biological requirements of being human are more than simply a cost of doing science, they are an active deficit against it.

The first limitation of our biological bodies is that we are provided with only a limited set of five senses by which we can absorb data. Even with five senses, we are so reliant on our gelatinous orbs (eyes), that we insist on converting all data into into visual graphs, tables, and pictures. This obligatory photocopying of data into a form amenable to visual digestion is a lossy process, and predisposes our understanding of phenomena which can be understood visually. As someone who spends a lot of time making neat little powerpoint models to communicate new scientific findings, I am very familiar with both the power and the limitations of visual understanding.

While some scientific phenomena seem to have somehow transcended the visual world (I am thinking specifically of mathematical and physical discussion of higher dimensional space) the limitations of our biological brains are still an ultimate barrier to our understanding of natural phenomena. In order to understand something, we must have some comparable a priori understanding on which to draw.

We grope for an analogy by which we can explain what is happening. Electrons flow like water, proteins fold like origami, and beehives act like a single organism. Ultimately this need to explain new phenomena through pre-exisiting ones limits us to only step-wise advancement in science. We cannot leapfrog over ideas which we do not yet understand.

Even if we do have the pre-existing understanding to appreciate a phenomenon we are witnessing, our ability to identify the underlying mathematical relationships is highly limited. We are great at seeing a linear trend, and maybe we can pick out the odd logarithmic relationship, but we are hopelessly inept when it comes to seeing the complex algorithms of the world.

In cellular biology, we are particularly guilty of reporting on naturalistic phenomenon, while glossing over the mathematical relationships that underpin the systems we study. We produce an endless supply of scientific reports full of experimental data, hypotheses, and neat little models, but it is the rare exception which contains a mathematical equation.

For an idea of what I am talking about, check out the cell biology or biochemistry sections of top level scientific journals like Science or Nature. What proportion would you suppose contains a mathematical equation? Yes there are statistical tests of various relationships, but this is all too often the entirety of mathematical analysis in a paper. When this protein goes down, this other one goes up in a statistically significant manner, but that is usually as far as we go.

Although this might seem harsh criticism of the natural sciences, there are good reasons that there is so little math in biology, and it is certainly something that I myself am guilty of. The problem stems from the fact that biological systems are highly complex and non-linear systems. While biological brains are readily capable of understanding how two or maybe three factors interact, we have no means of understanding how the thousands of factors involved in a single cell are interacting.

I would propose that biological humans will never be able to understand complex biological systems with mathematical precision. It will require a new kind of scientist which can see the complex mathematical relationships and account for the thousands or millions of factors which interact in a given cell, a given body, or a given society. It will require a computer scientist.

Computers will make better scientists because they are not subject to the limits of human biology. To a computer all data is mathematical. There is no need for intermediating steps to convert data into a simplified visual form. Computers can collect data from any number of imaginable sensory devices and perform mathematical analyses directly upon this data.

Computers also have no need to understand why something is as it is. While this is often cited as a weakness of computers, it can also be seen as a positive. A computer could theoretically identify the multi-variate mathematical relationships that rule a complex system with no need to understand why this relationship exists. A computational analysis of a complex system would reveal properties about how things are, not why they are that way.

Aside: In scientific discussions, this type of discussion might break out into a correlation versus causation argument, but I have always felt that causation may be nothing more than a highly statistically significant correlation. We never really know there are no other factors which could be causative in the system. With enough data, I am convinced that an adequately intelligent computer system could identify relevant mathematical relationships which underpin natural systems which are every bit as good and better than what humans have devised.

Simply put, my argument is this: The world is made of math, computers will make better scientists because they are better than us at math.

John Henry died with a hammer in his hand when he went up against the steam drill, well scientists will soon be up against the steam-scientist and it will either be get out of the way or die with a pipette in your hand. Until now it has never been possible to envision any type of scientist other than a human one. We simply had to settle for suppressing the non-scientific elements of our being and become the best possible scientists we could be. To accomplish this we developed elaborate systems of scientific education, research and publication. In my next post I will delve into the inefficiency of the wider scientific system and why technology represents an imminent threat to the entire house of cards.

——–

Thought Infection

A blog about the future and everything else

Monthly Archives: September 2013

Confessions of a Biological Scientist Part IV: SimEureka

Confessions of a Biological Scientist – Part III: The Natural Digitization of Science

Confessions of a Biological Scientist – Part II: The Organic Engine of Science

Confessions of a Biological Scientist – Part I: The Limits of Meat