Saturday, March 30, 2013

Post-docs and individual exploitation

There's a wonderful interview in this week's Science Careers of Ed Lazowska, a computer scientist and policy expert at the University of Washington. The interview is about careers in computer science, especially at the PhD level. I encourage you to read it if you are in or considering entering any STEM field. I think it is foretelling of a very important trend in both science and career fields.

One particular response that stood out in the interview addressed a question concerning the numbers of individuals presently earning PhD's in computer science. The end of his reply to this question was
I do think we need to be cautious. We need to avoid the overproduction—and, honestly, exploitation—that characterizes other fields. Hopefully we'll be smart enough to learn from their behavior.

What's interesting is that Lazowska has identified the overproduction of PhD's in some STEM fields with exploitation. I believe he's claiming that other fields use the competition for limited faculty and industrial positions to obtain cheap labor, such as in the form of the post-doctoral position.

In other words, the culture of a particular field may dictate that one must perform multiple post-docs as the way to get a faculty position. However, this is just exploitation in disguise: promise someone a faculty position, but only if they work for you and for lousy pay and benefits.

One important thing to do if you are a PhD student is to realize this attitude early. I'm not saying that you shouldn't try for a faculty position if you really want one, but realize that the motives behind the establishment known as a post-doc position may well be more than just to help you gain experience.

I for one am currently applying to post-doc positions, since they are a good fit for me. Ultimately, it comes down to critically thinking about the best position for yourself and what would make you happiest.

Wednesday, March 27, 2013

1000 scientists determine genes linked to common cancers

A rather neat and important example of the shift towards big science driven by data has been reported in this Guardian article. It explains how a recent large-scale study in the UK has linked faults in the DNA from thousands of individuals to an increased likelihood of developing prostrate, breast, and ovarian cancers, which are some of the most prevalent and dangerous forms of cancer.

I suspect that the most important problems that mankind faces will be most effectively solved in this manner, combining the efforts of many individuals of differing expertise to mine large banks of data for meaningful correlations.

I also wonder if the Information Age will lead to new advances in how data is collected or generated. Most of what I read about data-centric science assumes that the data we need to solve a problem is already available somewhere in some database connected to the net. But the fundamental hypothesis of science is that our models must match observations, so making a number of observations, in my mind, should come before anything else.

I expect that how we perform our observations and measurements will change as the Information Age matures, not just how we process our data.

Tuesday, March 26, 2013

A satisfying definition of emergence (at least, for me)

Building a bit off of yesterday's blog topic concerning biology, mathematics, and complexity, I wanted to note a satisfying and simple explanation of emergent phenomena in P. W. Anderson's "Physics: The Opening to Complexity."

To paraphrase, emergent phenomena are not logical consequences of underlying physical laws. In other words, one can't deduce the behavior of, say, monarch butterflies from the laws of quantum electrodynamics. However, emergent behavior cannot contradict the physical laws upon which they are built.

Monday, March 25, 2013

Mathematics has a new friend: biology

In 2004 Joel E. Cohen wrote an article in PLoS Biology entitled "Mathematics Is Biology's Next Microscope, Only Better; Biology Is Mathematics' Next Physics, Only Better." This short article, which has been on my "To Read" list for a long time, is a brief history and assessment of the contributions that each field has and continues to make in the other.

As the title suggests, half of Cohen's claim is that the problems in modern biology will fuel the development of new mathematics. These mathematics should address the fundamental questions posed by biologists, such as "How does it work?" and "What is it for?" There are six such questions, and they are further divided according to the many, many orders of magnitude of space and time that are spanned by biological processes: from molecular biology to global ecosystems and from ultrafast photosynthesis to evolution. Scale-dependent systems and emergent phenomena are the primary themes in modern biological problems.

To illustrate his idea of mathematics, Cohen paints a picture of a tetrahedron with the topics of data structures, theories and models, algorithms, and computers and software located at the four vertexes. Each topic affects the others and has certain strengths for addressing the problems of biology. No weaknesses in the current state of mathematics are mentioned per-se, but any real weakness is likely the notion that, for some problems, the appropriate mathematics simply don't yet exist.

For Cohen, mathematics is decidedly applied mathematics. I doubt he has much to say about topics with no direct relevance for biological applications.

The article is divided into past, present, and future. Cohen first goes into a brief review of the historical interplay between math and biology, starting with what I think is an excellent example: William Harvey's discovery of the circulation of the blood. Just enough background is given to appreciate how novel and unexpected this discovery was. Notably, empiricism, aided by calculations, was in its infancy during Harvey's time. Cohen then pays homage to several other co-developments of math and biology, some of which are nicely summarized in the article's Table 1.

For present matters, Cohen notes that issues of emergence and complexity should lead to great discoveries in mathematics. What is notable here is that emergence in biological systems at one particular level of organization is driven by events at both lower and higher levels. For example, both the genes of an organism and evolution determine many aspects of species. Cohen also provides an example of recent research that marries ideas from statistics, hierarchical clustering, and cancer cell biology. This example is a bit difficult to follow, but I think it is a good analogy of the interplay he is discussing. (To be fair, I was reading the article in an airplane flying through some turbulence, so it was difficult to give this section my full attention.)

The article finishes with a future outlook for his thesis and very briefly presents some ethical problems and opportunities for the continued correspondence between the two fields. I didn't find this section terribly insightful.

This article and those similar to it can't help but make me feel like the Age of Physics is near an end. The problems that occupy most practical people's minds today seem to be concerned with complexity. Physics, which is concerned with constructing models based on the simplest possible assumptions, is by its very nature a difficult tool for understanding phenomena that emerge from the entangled interactions of many heterogeneous parts. Biology just happens to be one field that can push forward our understanding of complex systems. Computation, information science, and neuroscience are other fields that will help further mathematics.

Physics will always be important, but the domain of natural phenomena in which it finds itself useful is lessening as the Information Age comes into full swing.

Tuesday, March 19, 2013

Understanding the correlations between model parameters of speckle

Today I read "Structural correlations in Gaussian random wave fields," an old PRE by Freund and Shvartsman. The authors analytically found the existence of correlations between the amplitude and phase gradients in random electromagnetic fields commonly known as speckle. Notably, while the amplitude and phase are not correlated at certain points, the amplitude is correlated to the gradient of the phase. Higher amplitudes usually are found with smaller phase gradients and vice-versa.

What's not clear to me is if this treatment works for vector fields or only scalar fields. Notably, I'm not sure what phase means for a random vector field.

Perhaps the authors make the assumption that the components of the vector are independent and thus a scalar treatment is sufficient, but I'm not sure that this is so.

Wednesday, March 13, 2013

Computation and theory are now better defined

I've written several times in an attempt to find the reasoning that leads many authors in optics to present computations (a.k.a. simulations, or numerics) as arguments for the validity of an analytical model for a set of observations. Usually, the computations merely "print the results" of some theory, which I think necessarily eliminates its ability to independently confirm the theory. I have argued in the past that the computations usually add nothing new to the paper but merely fill it with trivial content. However, I know this isn't always the case and haven't found a good reason for why some computations just don't add much to a paper.

This time I will try to tackle this problem by starting with a question: what is the difference between computation and theory? If I feel that presenting both as arguments in a paper is redundant, then finding the differences (or lack-there-of) between them ought to highlight exactly why this is so.

Let's start by considering that computation is only as good as the assumptions that go into it. For example, many-body solvers may use a set of approximations about the microscopic entities in a system, such as assuming that molecules are hard spheres. For this reason, any phenomenon that requires the molecules to interact in a manner other than as hard spheres cannot be simulated with this assumption [1].

Assumptions like these, however, are also central to deriving many physical theories. If theory and computation are based on the same assumptions, then there's probably little difference between the two, but experience still tells us that there is some difference.

To probe deeper, consider the difference between function and procedure as described in Structure and Interpretation of Computer Programs:
The contrast between function and procedure is a reflection of the general distinction between describing properties of things and describing how to do things, or, as it is sometimes referred to, the distinction between declarative knowledge and imperative knowledge. In mathematics we are usually concerned with declarative (what is) descriptions, whereas in computer science we are usually concerned with imperative (how to) descriptions.
SICP, Section 1.1.7
Based on this quote, I think that a theory is a declaration about how some variables or parameters relate to one another [2]. It is no different than the idea of "function" described above. On the other hand,a computation typically takes some input parameters and produces an output, like reading a URL in a web browser and displaying the content of that webpage onto the computer screen. A computation, in this sense, also relates different parameters to one another. Again, there is little difference between the two.

However--and I think this is the key point--the means by which a computation obtains the outputs is through a series of steps, not through some statement or equation.

Let's recap so far. Computations and theories are very much alike. They relate parameters to one another and are many times based on the same assumptions. As arguments for an explanation for some set of observations they are both tools. The means by which they relate parameters, though, are different.

With this I think I've finally arrived at what bothers me about the use of computation in so many journal articles. The difference between theory and computation lies in the "means" or the "how-to." But  as arguments, they are merely communicating the exact same "end," and are therefore redundant as arguments.

The choice between presenting an analytical theory or a simulation often comes down to which is simpler and produces results that are easier to understand. Additionally, some problems are simply better-suited to one approach or the other.

A colleague of mine suggested that both may be included in a journal article because some types of people better understand theory while others better understand a computation.

Finally, I admit that I started writing this thinking there was some deeply hidden distinction between theory and computation. The problem was in how I defined theory and computation. I better defined them in footnote 1, which I wrote after writing the bulk of this entry. Computation and analytical theories are two ways of exploring the results of a model. In this case, models, computations, and analytic theories form a hierarchy with models located above the other two. In other words, computations and analytical theories are both types of models.

[1] Admittedly, one could argue here that a computation is just one way to arrive at the results of a theory, which would therefore make computation a subset of theories. In this case, a better distinction would be drawn by contrasting analytical theories, which are those expressed by equations and derived from mathematical principles, and computational models. Both of these are subsets of just "models." When I use the word "theory" in this essay, I usually mean analytical theory.

[2] The question as to what is a variable or a parameter is also important, since not all the parameters in a theory are necessarily measurable. I think this point is subtle, yet completely obvious given some thought. For example, typically voltages and currents are measured in electromagnetism, not electric or magnetic fields. For the time being, I think that a parameter is some quantity that either a) is measurable, or b) is not measurable but aids in the construction of a theory.

Tuesday, March 12, 2013

Notes on "Biological Physics," Part II

I finished the review of "Biological Physics" today, which included the sections on bioenergetics, forces, and single-molecule experiments. I skipped the section on reaction theory because I am not familiar with the topic and it didn't interest me as much as the others.

There are two primary topics in bioenergetics at the biomolecular level: charge transport and light transduction. Charge transport refers to the process by which isolated charges travel amongst different sites in a complex molecule. This process is inherently quantum mechanical, since electrons and holes may actually tunnel into different sites in the molecule, depending on the molecule's conformation.

Light transduction refers to the conversion of energy in a photon to chemical or electronic energy. A paragraph is dedicated to human vision and the photo-induced isomerization that is central to its operation, but the rest of this sub-section is devoted to photosynthesis.

During photosynthesis, "antenna systems" in the chlorophyll molecules capture light energy, which is transferred to other parts of the plant cell along excited molecular states, much like in Foerster resonance energy transfer. The transfer is so fast that the quantum mechanical coherence of the excited states likely plays a role. It seems that most of the work done up to the point in time when the article was written has been performed by theorists.

The various forces in the cell are typically "effective" forces models typically neglect the fundamental electromagnetic nature of the primary forces in the cell. At the protein level, enzymes may actually pull apart the covalent bonds in "violent" events. It's also been hypothesized that mechanical vibrations in the form of solitons can propagate along the covalently-bonded protein backbone, but this is strongly debated.

The transmission of forces through a heterogeneous medium, like the cell membrane is also a topic of study.

Finally, single-molecule studies are gaining prominence as experimental techniques become more refined, but "the challenge of studying individual protein molecules is still very much in its infancy... The key is to use extreme dilution so that only a single biomolecule is in the reaction volume."

Much single-molecule work has been done on DNA because it is simple and readily obtained. Spring-like forces in DNA are both enthalpic, which means they depend on the energy change due to deformation of electronic orbitals, and entropic, which means the DNA resists changing its shape due to interaction with its thermal environment.

In the conclusion, the authors anticipate that problems relating to the brain lie ahead as major areas of work in biological physics.

It would seem that the experimental study of proteins remains a major challenge to biological physics, but also is perhaps the most worthwhile to pursue. Photosynthesis, the effects of a protein's environment on its folding and charge transport, disordered protein behavior, and the forces between parts of proteins are not very well-understood. If there are new discoveries to be made, then I think they lie in protein dynamics.

Monday, March 11, 2013

Manatees and cabbage palms

My fiancée K and I paid a visit yesterday to Blue Springs Park, a Florida state park which is just outside Orange City, Florida. Of course, the highlight of the park are the manatees, which swim up the spring to warm themselves during the winter months in water that is a near constant 72 degree F. The warm water comes up from the Florida aquifer, a very large pool of underground water situated in porous rock below the state. One unfortunate result of the porosity of this rock is that sink holes may develop rapidly and unexpectedly below buildings.

On this trip K showed me a type of tree known as the cabbage palm, whose scientific name is Sabal palmetto. This tree is very common in the central Florida area and has been very important for people living in Florida since the time of the Native Americans. Its leaves were interwoven to provide roofs for shelters and its trunks provided lumber to early settlers.

The cabbage palm lies very low to the ground for a long period of time in its youth while saving energy stores. At some particular time (I'm not sure when this occurs in its life cycle), it shoots up and grows rapidly to a very tall height. The reason for this behavior is that much of the central Florida ecosystem relies on fire for its maintenance. The cabbage palm is relatively resistant to fire in both its low-lying state and as a very tall palm tree. For intermediate heights when its upper trunk is exposed, however, it can be killed by the frequent fires in the area. For this reason, it must grow quickly or succumb to the flames.

This quick-growing behavior reminds me of  the yagrumo tree that I encountered in Puerto Rico.

Thursday, March 7, 2013

Improve your writing: write to a newspaper

I submitted an editorial piece to the Orlando Sentinel recently in my continuing efforts to improve my writing and expose myself to other forms of publishing. This article was published yesterday and may be found here:,0,933320.story. It concerns the use of bicycles as a form of transportation near the UCF campus.

One thing I learned from this experience was that newspapers prefer much shorter paragraphs than scientific publishers. I originally had three or four paragraphs in the 400-word essay; the editor turned it into nine. I suppose having multiple paragraphs makes the article easier to read and allows the reader more opportunities to "abandon" the article once they've started reading.

I also learned that newspaper editors won't ask writers if it's OK to perform edits beyond simply breaking a piece of writing into paragraphs. I was a bit disappointed that a few phrases were cut from my original article since they argued for a few points that I felt were important. However, the overall message of the article is more clear in its published form since it is not obfuscated by too many arguments.

Overall, I'm pleased with the experience, and I'm contemplating how to move forward with increasing bike awareness in Orlando.

Friday, March 1, 2013

Considering the value of a PhD... now that I almost have one

I've lately been looking into reasons for people getting PhD's, job placements and outlooks, income levels, etc. I'm doing this partly as a response to the question of "what have I done with my life these past five and a half years?" I'm also just curious what other people think on the topic.

Here is a nice discussion of a PhD holder and academic on his blog about getting a PhD in physics. His advice: get a PhD in physics because you want to be a graduate student for five or six years.

Well, this is advice I've never heard before.

Notes on "Biological Physics," Part I

There is an article from 1999 in Reviews of Modern Physics entitled "Biological Physics." This review summarizes research during the twentieth century where "physics has influenced biology and where investigations on biological systems have led to new physical insights." The exchange of ideas between the two fields has not been of equal magnitude, the authors note. Many tools from physics have found their way into the biological sciences, though some biological systems have led to new physics, usually in the form of providing experimental testbeds for new physical theories. The article is primarily concerned with molecular biological physics.

The seven primary sections of the review are
  1. The structures of biological systems
  2. Complexity in biology
  3. Dynamics, mostly within proteins
  4. Reaction theory, where biology has provided testbeds for new physical theories
  5. Bioenergetics
  6. Forces
  7. Single-molecule experiments.
Most of the interesting ideas I've found so far in the article are associated with the complexity and dynamics of biomolecules. Particularly, there is an idea known as the principle of minimal frustration. From the Wikipedia article,
"This principle says that nature has chosen amino acid sequences so that the folded state of the protein is very stable. In addition, the undesired interactions between amino acids along the folding pathway are reduced making the acquisition of the folded state a very fast process. Even though nature has reduced the level of frustration in proteins, some degree of it remains up to now as can be observed in the presence of local minima in the energy landscape of proteins."
This idea came from a theory of energy landscapes for proteins that was developed by Bryngelson and Wolynes. In language that I'm more familiar with, the potential energy of the molecules has some fractal-like structure, because from Section III in the article the authors state that
"The kinetic observations suggest that the energy landscape might have a hierarchical structure, arranged in a number of tiers, with different tiers having widely separated average barrier heights."
It seems like structural determination of proteins and other biomolecules has become something akin to bookkeeping. The tools exist and are refined to find static structures, like neutron scattering and NMR. Additionally, the energy landscape theory for protein folding seems to be mature at this point as well. So what open-ended questions still exist in biological physics? After reading up to section V, I've compiled the following grand problems in biological physics as I've interpreted them from this paper only:
  1. "A synthesis that connects structure, energy landscape, dynamics, and function has not yet been achieved." This seems to suggest that there is some degree of incoherence between these individual fields of study, so ideas that link them together are required.
  2. Biochemists can now synthesize their own proteins, but can they do this in a useful manner, for, say, molecular and microscopic engineering purposes?
  3. Sensing and characterizing phase transitions, especially in glassy systems, could lead to better experimental investigations into protein folding.
  4. "Understanding protein folding can improve the capability of predicting protein structure from sequence." Apparently there's a lot of DNA sequence information, but predicting what proteins come from it is nontrivial.