Quite Frankly I'm Not Sure What To Tag This - Tumblr Posts

Okay so I saw the "shorter" and "taller" were pretty much exactly the same and I thought yeah that makes sense it's pretty surprising how much that makes sense, you pull a random sample of 2 people of different heights and the odds you pull the taller one first is the same as you pulling the shorter one first. But 10.2% (percentage at the time I'm doing this) seemed a little high for randomly pulling people that are exactly the same height. So that got me curious, assuming the tumblr population reflects the global population (which very well may not be true), what are the odds that we see at least 10.2% of the population being the same height as the person pulled right before them?

Kinda longish stats rambling under the cut so if you don't want to look at it (I'd rather you would :D) you can go right along. There is a tl;dr at the end if you just want results.

Take this with a grain of salt, I'm not sure how good my practices are. Human height is bimodal, two normal distributions, one female and one male, obviously a simplification of the human experience, but it'll do for the height distribution I'm going to use. I took the means (69 in and 64 in) and standard deviations (3 in and 2.8 in) for heights for each sex from this site, I didn't vet it but the numbers didn't seem unreasonable. It uses height data from Europe, North America, Australia, and East Asia from people born between 1886 and 1994. Looked for tumblr sex distributions and found conflicting reports so I'm just going 50/50 for simplicity. I found this site and this site suggesting it is 50/50 so hopefully I'm not too far off.

Hypothesis testing time! My hypothesis being people are more likely to actually vote on this poll if they are the same height as the prev.

I ran 10,000 simulations of pulling a random sample of 50,000 people and for each simulation recorded what percentage of people had the same height (rounded to the nearest inch) as the person pulled directly before them. Each person has a 50/50 chance of being AFAB or AMAB, and then their specific height is picked from a normal distribution based on sex. I then made a histogram for all of these simulations to visualize these results and:

Okay So I Saw The "shorter" And "taller" Were Pretty Much Exactly The Same And I Thought Yeah That Makes

WOW! that's quite far away. Pretty much 0% of these simulations had a percentage at or above 10.2%. So uh. Pretty unlikely. Pretty cool! I'd put a handful of asterisks next to this figure.

But tumblr has a reputation for being mostly female, whether or not that's true I thought it would be fun to look at a scenario where the voters here are 75/25 split female/male and a scenario where the voters are all female and all male. So here's the breakdown for these guys, I knocked the simulations down to 1,000 because I am impatient and didn't want to wait more than 2 seconds.

For all females:

Okay So I Saw The "shorter" And "taller" Were Pretty Much Exactly The Same And I Thought Yeah That Makes

Actually a decent chunk of the simulations (0.9%) make it above the 10.2% line. Still kind of unlikely but definitely not outside the realm of possibility.

For all males:

Okay So I Saw The "shorter" And "taller" Were Pretty Much Exactly The Same And I Thought Yeah That Makes

Again 0% chance of landing at or equal to 10.2% of the votes being equal height as prev! And that's just a 0.2 inch difference in standard deviation.

And my 75/25 female/male breakdown:

Okay So I Saw The "shorter" And "taller" Were Pretty Much Exactly The Same And I Thought Yeah That Makes

Also 0%. Jeez that's far.

I did one more 85/15 split just for funsies:

Okay So I Saw The "shorter" And "taller" Were Pretty Much Exactly The Same And I Thought Yeah That Makes

Still 0%. Kinda wild that introducing a little bit of sex variation changes it this much. If I'm doing something wrong conceptually, please feel free to let me know.

I'm aware that there's been a lot of change in height in the past century when the original stats were taken, so the baseline standard deviations might not be right for these purposes. Also, tumblr users tend to be concentrated in a couple countries and definitely do not reflect the real age distribution of the population. I tried looking into tumblr's country breakdown then going to the major countries' height census data but that was falling a little too deep into the rabbit hole and I don't know German. Surprisingly enough it looks like governments aren't champing at the bit to tell you the standard deviations of the heights of their populations :/.

Anyways it looks like given the very many assumptions I made about the tumblr population, there's a fighting chance that seeing your prev has the same height as you makes you more inclined to actually vote on it! (yes I'm aware correlation is not causation I just thought this was neat)

tl;dr it is pretty damn unlikely that given the tumblr population is representative of the global population, we'd see a percentage of 10.2% the same height just by random chance


Tags :