-
Notifications
You must be signed in to change notification settings - Fork 8
Mendel's Genetics #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Once the build has completed, you can preview your PR at this URL: https://biojulia.dev/BiojuliaDocs/previews/PR16/ |
Just noting that the comment is being made, but the link doesn't actually work. Probably unrelated to the above, your pull request is for some reason requesting to merge into another branch, rather than into |
kescobo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another solution would be to use StatsBase.jl and do a weighted probability.
One other thing that would be nice to include here is a bit more didactic discussion about how often times we make algorithms that are narrowly tailored, but then we either repeat ourselves or get more complicated as additional requirements get tacked on. Eg, for this problem, your solution works for the specific problem, but we'd have to derive a new equation if the question is something like "What's the probability of a heterozygous offspring?" It also doesn't scale up if we add another trait etc.
Nice thing about the StatsBase.jl solution and even a simulation is that they can be made generic and then can be used to ask more types of questions. I'm not necessarily demanding we add this to a first draft, but maybe open an issue as a potential enhancement.
|
I like the idea of a simulation, though it will generally not give a precisely correct answer for rosalind. I think that's fine if that's explained. |
|
@kescobo Ready for a final review! I think you've reviewed most of the first part (algorithm piece), so the main thing to focus on here is the statistical/sampling method. |
|
|
||
| For instance, we can use a simulation that can broadly calculate the likelihood of a given offspring based on a set of given probabilities. | ||
|
|
||
| This solution is generic and can be used to ask more types of questions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generic solution I was thinking was actually not to simulate, but rather to be generic with the exact statistics. I like the simulation too, but eg outputting the probability matrix you generated would then allow you to count other outputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, maybe I can make this function more general by having the probability matrix as an input as well. Is that what you meant here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of. If you're strictly in mendelian land, you can think of things in terms of allele frequencies and multiplication of probabilities. I also wonder if it would be worth introducing something about julia types here... but we can save that for later
docs/src/rosalind/07-iprb.md
Outdated
|
|
||
| function mendel_sim(k, m, n; iterations=100000) | ||
| # Genotypes: 1=HH, 2=Hh, 3=hh | ||
| population = [fill(1, k); fill(2, m); fill(3, n)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using a weight vector here makes more sense - if you have millions, you're gonna allocate a giant array. Instead you can do something like
total_pop = k+m+n
wts = [k/total_pop, m/total_pop, n/total_pop]
sample([1,2,3], weights(wts), 2) # samples from the vector [1,2,3] with probability weights given by wts
docs/src/rosalind/07-iprb.md
Outdated
| dominant_count = sum( | ||
| offspring_prob[sample(population, 2; replace=false)...] | ||
| for i in 1:iterations | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to allocate a lot I think. I think the canonical way to do this is something like
sum(1:iterations) do _
(i,j) = sample([1,2,3], weights(wts), 2)
return offspring_prob[i,j]
end
|
Made some edits based on your last comments! @kescobo I think we are close to being able to merge in? |


Making a draft PR here. There's multiple ways to solve the problem, and I added a first approach. I'm thinking that the second would be a more statistical/simulation approach. Basically, based on the values of k, m, n, we can make a vector containing all of the possible organisms (eg. [HH, Hh, hh, HH, etc.]). Then, we can calculate the percentage of dominant individuals/total individuals.
Wanted to run this by you first and see if you had any suggestions on packages to use.