Basically exactly what the title says. In case there isn’t a great place, or this post ends up getting more visibility than wherever I end up asking I will explain my approximate competency level and the question below.
In terms of competency I have an engineering background and degree, which means I had a single class in statistics. Technically I was one class short of a math minor (Graph Theory) when I graduated. Unlike most engineers and Six Sigma “graduates” I don’t think this automatically makes me some kind of math/stats wizard. I’m aware I know just enough that I can unintentionally massage data to fit my bias (mini rant over).
My question is, when looking at a human population and trying to find the approximate subset of people with certain attributes how are correlations handled to avoid double counting?
For example let’s say I am looking at a specific city and my data sets are thee most recent census, BLS.gov, and Pew Research. With the above sources I can pretty easily estimate something along the lines of
The number men in a US city that are:
- Between the ages of 22-44
- Have a STEM degree
However, if I then wanted to add another factor:
- Are/Vote liberal
I know that is going to interfere with the original criteria because higher levels of education are correlated with people being more liberal, thus if I just punched in the percentages from all three data points the resulting number is likely going to be much smaller than reality.
Is there a term or method I can read up on for how to account for overlaps/correlations between population subsets? Does this make sense or am I asking the wrong kind of question?
FWIW none of this is related to my job, an argument, a shit post, a data graphic, or anything else I will ever really make. It’s just for something specific (not the actually the above example but something like it using the sources I mentioned) I am personally curious about. I have also more generally been wondering about how to account for this kind of overlap for a couple of years now.
Regardless, thanks for taking the time to at least read all this.
Cheers!