Student Retreat 2016: Big Bear Edition

This is a guest post from Tanya Phung!! Thanks Tanya!!

-Rob

We had our third annual retreat a couple of weeks ago in Big Bear Lake, California, amazingly organized by two second-year students, Kimberly Insigne and Rebecca Walker.

Following the pizza party on the first night, the first four presentations (each 15 minutes in length) were held. We learned about compound heterozygotes from Rob Brown, fine-scale mapping from Gleb Kichaeve, mutation variation from myself, and methods to analyze RNA-seq data from Zijun.

On the second day, some of us got to take a cooking lesson from Eleazar who also generously made breakfast for all of us. After that, we started the next set of presentations where we learned about missing heritability from Huwenbo Shi, how to apply linear regression analysis to learn about cancer survival from Chengyang Wang, and how to find evolutionarily conserved regions in a Hidden Markov Model framework from Adriana Spearlea. Then, we all walked into downtown Big Bear for lunch. When we got back, we had our last set of presentations from Shan Sabri on Drop-seq, Aliz Rao on predicting the mutational load of a gene, and Kimberly Insigne on library design to study promoters in Ecoli.

Overall, we had talks on a variety of topics and everyone was at a different stage of their research. Presenting at this retreat is such a valuable experience because it is neither like presenting the completed and polished work at a conference nor like presenting at lab meetings where everyone in the lab has an idea of what you are working on. Most of us are in the early stages of our Ph.D. and so presenting our work to an audience that is mostly unfamiliar with our research topics and getting feedback from faculty members has been a very useful exercise.

In addition to simulating talks, we had lots of free time to explore the lake near the cabin, some went hiking, and some played beer pong. Most of the times, we are pretty caught up with classes, research, and meetings and so the retreat has been a good way to get to know other members of our program who we normally do not see.

In the morning of the final day, we discussed Bioinformatics retreat, where we brought up issues that could be changed to improve our experience in the program. This is also one unique aspect of our program because the faculty members are really open to our ideas and suggestions to make the program better. After that, we packed up and started to head back to Los Angeles after a weekend filled with simulating talks, fun, and bonding experience with our colleagues.

Below are some photos:

Bioinformatics students and faculty

DSC_0453lq

Dinner on the second night.

DSC_0446lqDSC_0444lqDSC_0443lqDSC_0442lq

 

The first (hopefully) annual QCBio Retreat!!

Sometimes you need to retreat a few steps before charging out ahead. That is what we did this past weekend.

UCLA has a long history of research in quantitative and computational biology. However, it did not have a dedicated program until a few years ago with the establishment of the Bioinformatics PhD.  A few years into this, the graduate students started this blog! But we did not have the critical mass of contributors to keep it going.

At the QCBio retreat we realized that not only had bioinformatics grown but also the number of people doing research that is appropriate to this blog, so it is high time we get it going again! This blog is no longer going to be just bioinformatics, but rather any graduate students in QCBio and possibly some post-docs as well.

One idea that will be reflected directly on the blog is the “Open Questions” page. This is going to be a list of research questions that people have proposed, but don’t have the time or resources to currently pursue but would be very happy to collaborate on. What I like most, is that this can give undergraduates an opportunity to do research as well with graduate student mentors! We hope this will help shape our blog into not just posting about what we are doing, but a place to find research partners as well.

We also decided to expand our weekly student meeting (“Bioinformagics”, not “Student Journal Club” as the faculty seem to call it) from just bioinformatics graduate students to also include graduate students in QCBio! This is a great meeting, it is student organized and student run. We set it up after we decide during our first student retreat (also student run and organized) that it would be very beneficial. It is one hour a week and an opportunity for us to present our work, get both scientific and presentation related feedback as well as a time to through out new ideas, collaborate, brain storm and problem solve.

This was a great QCBio retreat! Great presentations, fantastic posters, casino night, restarting the blog and expanding the student meeting! I for one am excited for the year to come!

The first annual Bioinformatics retreat was awesome.

DSC_1592

We had our first annual Bioinformatics Retreat this past spring! It was held at the Boone Center of the USC Wrigley Institute on Catalina Island, where the hosts were wonderful (including the Island foxes), and the balance of work and play was perfect.

On the first day, we unpacked and promptly boarded some kayaks so we could head to Two Harbors and play some beach volleyball. It was a close match between Team Superior and Team Inferior. Team Inferior won, although Team Superior had the clear MVP. Unfortunately, we don’t have photos from this day because no one wanted to let their camera go for a swim.

On the second day, we had a series of awesome student talks. Because we had less than 20 people in attendance, student talks were an hour long and informal, allowing for discussion and questions throughout rather than just for a few minutes at the end. We not only received research suggestions from the audience, but also critiques on our presentation style. This was a great help, as most of us are in the early stages of our Ph.D. and we don’t typically get this feedback at retreats or conferences. For dinner we went to the only restaurant in Two Harbors, followed by the only bar in Two Harbors, and fun was had by all.

Stay tuned for a link to the abstracts from our talks.

Here are some photos for your viewing pleasure:

DSC_1606

USC’s beautiful harbor, just a short kayak ride or walk away from Two Harbors

DSC_1576

Enjoying Kai’s talk

DSC_1609

Hiking to the only restaurant in Two Harbors

IMG_0364

Dinner in Two Harbors. Their clam chowder is THE BEST

IMG_0373

On the hike home from dinner we saw shooting stars!

IMG_0287

Bogdan with his game face on

IMG_0297

A normal group shot…

IMG_0299

… and naturally, a silly one.

Efficient forward simulation of whole genomes

Simulations are used extensively in population genetics, both for verifying analytical results and for exploring models that are mathematically intractable.  Simulation approaches can be broadly classified into forward-in-time and backward-in-time.

While backward-in-time (e.g. coalescent) simulations are extremely memory efficient for simulating neutral loci, it is difficult to simulate selection.  On the other hand, forward simulations allow very flexible selection models, but tend to be memory intensive.  This is because the standard approach to forward simulation stores all the variants carried by individuals in an array, whether or not those variants contribute to individual fitness.  With this approach, the memory used to store a population of individuals is 2NL bits, where N is the population size and L is the number of loci stored, assuming a single bit is used to store a variant (i.e. biallelic loci).  For a computer with 16GB of memory, this translates to 10000 individuals with 6.4Mbp of sequence per individual, or 1 million individuals with 64 kbp per individual.

An alternative approach is to represent individual chromosomes as mosaics of haplotypes from a founder population.  This approach is extremely memory efficient, allowing simulation of whole genomes of large populations.  I have written a forward simulator that implements this approach, called forqs (Forward simulation of Recombination, Quantitative traits, and Selection):

http://bioinformatics.oxfordjournals.org/content/30/4/576

https://bitbucket.org/dkessner/forqs

forqs allows explicit modeling of quantitative traits and selection based on individual trait values.  Part of the motivation for writing the simulator was to simulate artificial selection experiments.

I’ve recently finished a simulation study using forqs to evaluate the power of artificial selection experiments to identify quantitative trait loci (QTLs) underlying a quantitative trait.  Here’s a link to the manuscript pre-print on bioRxiv:

http://biorxiv.org/content/early/2014/06/04/005892

 

 

 

 

Bioinfo retreat rained out

We got rained out for the first time in L.A. when our Catalina Island retreat got rescheduled due to a forecast of thunder, lighting, 15ft waves and waterspouts. (Would’ve made for a great story to see those, though, don’t you think?) To make to best of it, we all went out and had a great time at BJ’s in Westwood. Their pizza always hits the spot!

BJs_022814

We’ll reschedule the retreat to sometime in April and May, which will be great, since the weather will be warmer and we’ll have great weather, guaranteed for that time of the year in California. :)

Local Ancestry?

Where are you from?  I mean, genetically?

A lot of work in my lab involves genetic ancestry.  One project in particular done by Wen-Yun Yang in our lab involved taking a person’s genetic data and using it to pinpoint where that person is from on a map.  A few weeks ago he successfully defended his PhD! We are all very excited for him!

But I am asking a different question, not where are you from, but rather, where is your DNA from?  In an individual from England, chances are that their entire DNA is from Europe. A person from Japan most likely has their entire DNA from Asia and a person from Kenya will likely have DNA from Africa.  But what about an African American? Their DNA is from both Africa and Europe (and often a bit from indigenous people from North America); we call individuals with DNA from multiple populations admixed individuals.

Slide02

Due to a genetic randomization process called recombination, the genome of an African American will be a mosaic of ancestry. Some stretches of their genomes will be from only African ancestry or only European ancestry and other regions will be from both European and African Ancestry.

A locus in the genome is a particular location in the genome (i.e.  ‘c’ is in the third locus of the word locus). Unfortunately for genetic researchers, the genome is not conveniently colored red and blue according to ancestry and so my research has focused on an extremely fast and accurate method to determine the locus specific ancestry (also called local ancestry) of an admixed individual.  More on that later, but first, why do we care about knowing this?

When we know the locus specific ancestry of one individual we can tell that person about their heritage, about where their ancestors may have come from. However, the main reason we want to determine locus specific ancestry in individuals is so that we can help determine personalized medical treatments and risk assessments. Many diseases have different risk and prevalence amounts in different ethnic populations. So knowing which ancestry an individual has at a risk location in a genome is very important. When looking at many admixed individuals, patterns in the ancestry mosaic can give us interesting clues as to the population history, migration rates and cultural dynamics of the interacting populations.

But how do we do determine the locus specific ancestry of an individual?  Recent advances in the field allow research to literally read every base pair, these are the As, Cs, Ts and Gs that stand for the nucleic acids that make up the genome, in other words, the letters in the novel that is our genome. It used to be that only about a million of the 3 billion base pairs were read, but now we can read all of them.

From one individual to the next the genome doesn’t change very much, changes occurred when at a base pair position, one individual has one nucleic acid (e.g. A) and the other a different (e.g. T).  We call these differences single nucleotide polymorphisms (SNPs) A few years ago we could only gather information on the SNPs that are common, such as one where 90% of the population has a T and the other 10% has a G.

Now that we can read all of the SNPs we have determined that many of these SNPs are very rare and are only seen in certain ethnic groups. We find that there are some SNPs that are seen in individuals from one continent, but never in individuals from any other continent. We call these continent-specific variants (CSVs).   While still not as good as color coding the genome for us, there are enough of these variants that they can quickly and easily be used to determine the locus specific ancestry of an individual. In the example below, the black lines represent the true ancestry of a hypothetical African American. The blue and green marks represent CSVs that were observed in that individual.  Because humans are diploid, having two copies of each chromosome, there are three possible ancestry combinations: having both copies of a locus on a chromosome from Africa, having both copies from Europe and having one copy from each. From the pattern of CSV mark, it is easy to infer the true ancestry.  Since you can do this with your eyes, we constructed a simple statistical model (a hidden Markov Model) which very quickly and accurate determines the ancestry of admixed individuals.

Screen Shot 2014-02-04 at 2.35.16 PM

With this method, researchers can more quickly and effectively investigate theories of population movements and changes and histories, develop personalized genomics techniques and can better understand the recombination process and how it shapes our genomic landscapes.

How I landed an internship at Illumina

To summarize it in a single sentence, and as any career-advice panel will tell you, it’s all about networking.

Step 1. Make your Linkedin profile as nice as you can

On your Linkedin profile, include a lot a search terms that recruiters might be searching for. E.g. “Interests include… Specialties:…” And here, list everything you can think of! On my profile, I listed everything from course titles to programming languages and bioinformatics tools. Also, note that whereas endorsements do not count at all, recommendations count a LOT. (This was a tip from a recruiter.) Ask colleagues or professors to write recommendations for you on Linkedin. I had to bug multiple people multiple times until I got one. Getting recommendations will be hard because not many people are active on Linkedin, but I imagine a professor might eventually write one for you. It’s easier than writing a recommendation letter, after all.

Step 2. Network at conferences

Whenever I go to conferences, I make a note of every booth that has to do with my area of expertise (read: a place that might possibly hire me eventually). Then, I go around the booths, converse with the representatives to learn about the companies (read: try to make a memorable impression), and collect business cards. When I get home after the conference, I find everyone I can on Linkedin and connect with them. The result is that I’m connected to a lot of people in the bioinformatics industry. Eventually, when recruiters do searches, I’m in many peoples’ second- and third-degree networks, so I pop up in their search results and they email me.

Step 3. Ask to get introduced

I found out that a lab-mate knew someone who worked in the Illumina booth. We got introduced, and bingo, I had the connection to the company I wanted to work at. (If you don’t know anyone, ask people if they know anyone. Ask everyone you know!)

Step 4. Keep those connections alive

Don’t just make/accept a connection. Include a personal note (“It was great meeting you at ASHG”), or ask a question and start a conversation. (“I noticed you worked at ABC. How did it compare to being in academia?”) Use every opportunity to remind your connections about who you are. Send out emails to them when you publish something, or send season’s greetings emails around the holidays. It might sound spammy, but you want them to think of you, next time their buddy asks for a recommendation to fill that certain bioinformatics position!

Step 5. Go for informational interviews

Go on lunches with people in industry, or take them out for coffee. When I went to San Diego for a conference, I made sure to set up informational interviews during my lunch hour. Online networking is great, but personal connections are a hundred times better. Prepare questions ahead of time, and remember, it is a no-no to ask about a position during an informational interview, unless they bring it up first. Don’t worry, if you made a good impression, the leads will come!

Do you have any additional tips for landing a bioinformatics job or internship? Share your thoughts in the comments!

Greetings from ASHG/ICHG 2013

The conference organized by the American Society of Human Genetics (ASHG) is the International Congress of Human Genetics (ICHG) and is one of the largest of its kind. This year it was held in Boston, MA, and it was great to hear talks about everything genetics, network, and explore a new city.

20131023_123305
This is the view of the company booths in the main exhibition hall of the Convention Center. The thousands of posters are around the edge of the room. This give you an idea about the scale of the event! All throughout the day, there are talks going on simultaneously in 10 rooms, and it would take you 15 minutes to walk from one room to the other, if they were at opposite sides of the building.

20131024_181338
View of downtown Boston when we leave the building.

20131023_231208
The highlight of the week is always the Illumina party. This year, some well-known scientists started dancing on the stage with the singer.

Besides the fun, I learned a lot, and now I feel extra motivated about getting back to research!

Which journal to publish in?

Here are some guidelines that might help in deciding which journal to publish in, and what to look out for.

1. High impact factor

Check out this post for an awesome summary: http://massgenomics.org/2013/01/high-impact-journals-genetics-and-genomics.html. These are the charts from that post that are most useful to us:

Top general or medical journals:

Journal ISSN Impact Factor 5-Year Avg. Articles Influence
NEW ENGL J MED 0028-4793 53.30 50.08 349 21.30
LANCET 0140-6736 38.28 33.80 276 13.61
NATURE 0028-0836 36.28 36.24 841 20.37
CELL 0092-8674 32.40 34.77 338 20.55
SCIENCE 0036-8075 31.20 32.45 871 17.53
JAMA 0098-7484 30.03 29.68 220 13.11
NAT MED 1078-8956 22.46 26.42 187 12.16
PLOS MED 1549-1277 16.27 15.38 126 6.30
J CLIN INVEST 0021-9738 13.07 15.43 402 6.95
PNAS USA 0027-8424 9.68 10.47 3614 4.90
BMC MED 1741-7015 6.04 5.77 114 2.11
PLOS ONE 1932-6203 4.09 4.54 13781 1.80

Top genetics and genomics journals:

Journal ISSN Impact Factor 5-Year Avg. Articles Influence
NAT REV GENET 1471-0056 38.08 31.36 71 16.96
NAT GENET 1061-4036 35.53 33.10 196 17.58
GENOME RES 1088-9051 13.61 12.49 208 7.16
GENE DEV 0890-9369 11.66 12.79 236 8.02
PLOS BIOL 1545-7885 11.45 13.63 180 7.84
AM J HUM GENET 0002-9297 10.60 11.72 162 5.87
TRENDS GENET 0168-9525 10.06 8.99 60 4.48
GENOME BIOL 1474-7596 9.04 7.90 151 4.13
PLOS GENET 1553-7390 8.69 9.17 548 5.11
CUR OPIN GEN DEV 0959-437X 8.09 8.04 105 4.38
SCI TRANSL MED 1946-6234 7.80 7.81 216 4.11
HUM MOL GENET 0964-6906 7.64 7.51 463 3.16
MOL THER 1525-0016 6.87 6.28 230 2.02
MUTAT RES-REV 1383-5742 6.46 7.92 21 2.42
J MED GENET 0022-2593 6.37 5.67 131 2.28
BMC BIOL 1741-7007 5.75 5.84 48 2.78
HUM MUTAT 1059-7794 5.69 5.85 200 2.35
MOL BIOL EVOL 0737-4038 5.55 9.86 297 3.84
DNA RES 1340-2838 5.16 5.28 42 2.06
EVOLUTION 0014-3820 5.15 5.61 285 2.43
HUM GENET 0340-6717 5.07 4.18 137 1.61

Top technology and informatics journals:

Journal ISSN Impact Factor 5-Year Avg. Articles Influence
NAT BIOTECHNOL 1087-0156 23.27 28.16 84 12.95
NAT METHODS 1548-7091 19.28 20.45 128 11.15
BIOINFORMATICS 1367-4803 5.47 6.05 707 2.61
PLOS COMPUT BIOL 1553-734X 5.22 5.84 407 2.72
BRIEF BIOINFORM 1467-5463 5.20 7.75 65 2.86
BMC BIOINFORMAT 1471-2105 2.75 3.49 557 1.32
BMC BIOTECHNOL 1472-6750 2.35 3.08 126 0.92

 2. Do you have what the journal is looking for?

I like to browse through current online issues and make sure they publish the type of article you have. It’s essential to run your choices by a colleague. For example, I was planning to publish about a correlation between certain types of variants and psychiatric disease. I was considering the following journals, but after hearing the comments, I had to cross them out:

  • J Biol Chem – I’d need to do molecular follow-up of my findings
  • Mol Psychiatry – need strong evidence of causality, e.g. same variant found in multiple families
  • Am J of Medical Gen – would need to be more focused on case reports

In the end, we went with PLOS ONE, for which my manuscript is a much better fit.

Do you have any additional suggestions for determining where to publish? Post in the comments!

 

Hello, world!

Welcome to talkinggenomes! This is a blog run by Bioinformatics Ph.D. students at UCLA. So, why did we start a blog? Well, there are a few reasons.

1. Science, obviously. We enjoy discussing awesome scientific topics that are usually-but-not-always bioinformatics related. A blog is a great way to discuss these ideas casually and openly with other scientists and non-scientists alike in plain English. This includes talking about news, controversies, hot topics from research conferences we attend, and our own research. Not only do we want to discuss current cutting edge research, but also explore the foundations of what is routine today. Did you know that a lot of statistical theory originated from studying gambling odds?

2. Bio-for-what? I’ve been asked this question many times when people ask what I do for a living after I respond “bioinformatics.” This blog, aside from talking about cool current research, is meant to shed some light on a field that is familiar to some and mysterious magic to others, but is the only way we can make dramatic strides in a variety of biological fields including evolution, molecular biology, population genetics, medicine, and the list goes on. There’s lots of information to learn from the huge amounts of high-throughput biological data, and we are here to decipher what these genomes are talking about.

3. A go-to interdisciplinary resource. Not only do we want to talk about cool stuff, but we also want to be a resource to help everyone be a little bit more interdisciplinary. Are you a biologist who wants to learn a few basic concepts about Unix? Are you a computer scientist who wants to learn about the Central Dogma of molecular biology? Which alignment tool is best for testing your hypothesis? How does sequencing work anyway? We’ll give you the quick run down of these topics with links to helpful resources.

4. And a little bit of self-advertisement.  We are based out of UCLA (GO BRUINS!) so we will shamelessly tell you about our own projects and plug our program. Join us! Join us! But seriously, if you are considering a career in academia or bioinformatics, we are a resource for you to ask questions and get an idea of what you would be getting yourself in to at UCLA or otherwise. But since we are at UCLA, we’ll tell you about great bioinformatics resources on campus.

In conclusion, welcome to our blog! Don’t be shy to comment and join the discussion.

Oh yeah, here’s the first shameless plug – follow us on twitter! @talkinggenomes

And contact us at talkinggenomes@gmail.com