Introduction
Statistic
and spatial analysis are widely used to support or refuse the argument of
different organizations, being considered an impartial scientific-based
position, free from individual interests. Although this is not completely truth
since the scientific process includes a number of decisions that can have
different results, the awareness of those decisions can provide a realistic
answer for the problems observed.
In this project, a hypothetical scenario is being used to analyze the dynamic of a spatial distribution: in Milwaukee County, the Civil Rights groups are claiming that the lottery tickets are concentrated mostly in areas dominated by minorities. The goal of the project is to explore, examine and interpret the related data, with geostatistical tools, determining if these arguments are justified or not.
Methodology
To
answer this question, the data available was a table with the addresses of the
lotteries and its sales amount; also, a feature class of the census data
related to the population race, with the percentage of non-white population.
Thus, for the purposes of this project, the non-white population will represent
the minorities.
The
first step to analyze this data is to geocode the table with the lottery
addresses, using the Geocoding Tool in Arc Map. After creating a point feature
and gather the data that will be used, a geodatabase was created to guarantee
reliability and organization within the dataset. The projection of the features
also needed to be changed, to minimize its distortion. The choice was the
Wisconsin State Plane – South Zone, where Milwaukee County is located, since
the smaller the area the projection is made for, the less distortion it will
have.
With
this standards set up, it’s time to apply the geostatistical and symbolization
tools to explore the data. For that, it’s important to present the concepts of
mean center, standard center and the application of weight.
Mean
Center is a parameter derived from the simple mean in a sample. However,
instead of using an attribute of each entity, it uses the coordinates. Then,
all the X coordinates of the sample will give the X mean, as well as with the Y
coordinates. The combination of the X mean and the Y mean will provide a point
feature that represents the mean center.
Standard
Center can be considered a spatial version of the simple standard deviation.
The mean, where the 0 standard deviation is located, in this case, will be the
mean center. For this reason, there’s no negative standard deviation. A radius
will be created in the area where the features locations are within the first
standard deviation, considering the relation of its coordinates with the mean
center coordinates.
In
both situations, a different analysis can be made by adding weight to each
parameter. That’s useful when your feature has important attributes that you
want to explore. The quantities of the attribute you choose will give different
levels of important in the calculation of the mean and the standard deviation.
That’s why it’s called that the parameter will be weighted. Without the weight,
each feature has equal relevance for the calculation; with the weight, each
feature is treated differently, depending of the attribute chose.
At
first, these parameters will be applied for the pure location of the lotteries,
without considering the difference in its sales. Then, with the weight, the
amount of sales of each lottery will personalize the result with a deeper interpretation.
To apply all these concepts, the Spatial Statistics Tools inside Arc Toolbox is
going to be used.
Also,
in a more general perspective, the Z-score and probabilities will be analyzed
for the whole county and three selected tracts. The Z-score represent simply
the exact standard deviation of a specific feature. The standard deviation is presented in six
intervals: -3, -2, -1, +1, +2 and +3. By using the Z-score, it’s possible to
determine the exact position of a feature, for example, between 0 and +1.
Each
Z-score has a representation within the probability, allowing the analysis of
the chances of one attribute be higher or lower than a number of your choice in
a year, for example. For that, a standard statistical table is used, where you
can determine the probability of a given z-score. The inverse process can also
be made, when you have the probability and needs to find the corresponding
z-score.
However,
by using the table, some approximations are made, since there are not all the
possible z-scores and probabilities. Because of that, to guarantee precision of
the results in the project, the NORM.INV function of Microsoft Excel is used to
provide the exact result, without approximations.
For
last, the symbolization in Arc Map is used to visualize and classify the data
in a way to be easily interpreted. Proportional and choropleth maps are used to
support the results and reach to a conclusion about the spatial question.
Results
Firstly,
it’s necessary to have a preliminary perspective of the characteristics of
Milwaukee County. In the Map 1 is possible to notice a non-white population
concentration in the north-western region of Milwaukee, and it’s extremely high
concentrated in the area outlined in red.
Thus, the analysis will be focused in this
area, interpreting if the results of high sales are being placed there or close
to there. The symbolization of the percentage of non-white people will be the
background of most of the other maps. That will help to analyze the sells
results, but a transparency was applied to avoid the maps to be cluttered.
Outlined
in pink are the tracts selected in this exercise to have the Z-score
calculated. In the northern tract, the z-score of 2.26 shows that this county
has a high amount of sales, far from the mean. The situation is more extreme in
the eastern tract, where the z-score is 7.95, which can be even considered an
outlier, because the sales are extremely high. In the western tract, the
situation can be considered normal, since the z-score is 0.39, extremely close
to the mean. Also answering the exercise questions, by using the formula in
Microsoft Excel (Figure 1), for the entire county, in 70% of the time the
lottery sales will exceed U$91,504. However, in 20% of the time, the lottery sales
will exceed U$628,122.
Figure
1 – Use of Excel to find exact results.
Map
1- Non white distribution in Milwaukee
In
the map 2, it’s possible to perceive how the lottery sales are concentrated in
the south, with the exception of some tracts in the north-east and in the
center. However, it’s important also to consider the size of each tract. The
tracts further from the center are larger, so it’s natural that the sales are
higher. In the other hand, none of these high-sales tracts coincides with high
concentration of non-white population. Thus, apparently, based only in these
two maps, looks like the sales are concentrated in tracts with a predominant
white population.
Map
2 – Sales Distribution in Milwaukee\
With
the analysis of the mean centers in 2007 and 2009, the map 3 and 4 shows that
when the weight is applied to the mean center, it shifts to the south, where
there’s no concentration of non-white population.
Map 3 – Mean Centers in 2007
Map 4 – Mean Centers in 2009
When
applying the standard distance in both years (Map 5 and Map 6), there’s no big
temporal difference. In both occasions, the standard distance covers a portion
of the “non-white area”, but also a big part of “white area”. Thus, it doesn’t
seem to mean a necessary concentration either in white or non-white regions.
Map
5 – Sales Distribution in 2007
Map
6 – Sales Distribution in 2009
However,
when analyzing the map 7, comparing both years in a single maps, a slightly
shift is noticed: in 2009 the standard distance is a little more in the
north-east than in 2007.
Map
7 – Comparison between 2007 and 2009
The reason for that can
be noticed when looking closer to comparison between the mean centers of 2007
and 2009 (Map 8). Both normal and weighted mean centers have shifted to north,
in direction of the area with high non-white population is noticed, in the
period of two years. It’s a small shift, it doesn’t change even the tract where
the mean center is located, however, it shows an important result about the
variation of time.
Map
8 – Comparison of Mean Centers (2007-2009)
Conclusion
Considering
the first results, apparently there’s no discrimination of race related to the
lottery sales. The high amount of sales does not fall in tracts with non-white
concentration. By simply seeing the mean center, it falls within the center of
the county itself, so it also doesn’t look to have a considerable difference. Actually,
the opposite idea is found when analyzing the mean center weighted by the
amount of sales: it shifts to the south, where the concentration of white
population is higher. Everything leads to think that there’s no reason for the
allegations of the civil rights groups.
However,
when analyzing the temporal dimension, it’s noticed that there’s a shift in
direction to the area where the non-white concentration is higher. Then, it’s
possible to affirm that in until 2009 there was no big difference in the amount
of sales depending on the race predominant in a given region. However, the
results lead us to think that in the future, the situation can change, because
there’s a tendency of shifting the mean center to north – where the non-white
population is concentrated.
The
studies about this matter should be expanded to recent years, in favor to
analyze how the temporal change is occurring. Also, it’s important to
understand the limitations of the analysis, since some factors were not
considered. The absolute amount of sales can also be related with the size of
the tracts and the total population living in it. Also, the population that
lives in a given county doesn’t necessary buy lottery tickets only within its
tract. This is true especially if it’s an urban area, where the mobility of
people is higher. Therefore, the results of this project show that by now and
with this data there’s no reason for the allegations of the Civil Rights
Groups, however, that doesn’t mean that the matter should be ignored.
Contrarily, this project encourages more studies to be made not only with
the same concepts, but also including other variables.

No comments:
Post a Comment