About Me

My photo
Geógrafa pela Unicamp (2014), incluindo um ano de intercâmbio universitário na Universidade de Wisconsin (EUA). Possui experiência na área de geotecnologias, GIS e planejamento urbano, tendo realizado estágios na Agemcamp, American Red Cross e - atualmente - no Grupo de Apoio ao Plano Diretor da Unicamp.

Tuesday, March 12, 2013

Lottery Sales in Milwaukee - Mean Center, Standard Distance and Weight


Introduction

Statistic and spatial analysis are widely used to support or refuse the argument of different organizations, being considered an impartial scientific-based position, free from individual interests. Although this is not completely truth since the scientific process includes a number of decisions that can have different results, the awareness of those decisions can provide a realistic answer for the problems observed.

In this project, a hypothetical scenario is being used to analyze the dynamic of a spatial distribution: in Milwaukee County, the Civil Rights groups are claiming that the lottery tickets are concentrated mostly in areas dominated by minorities. The goal of the project is to explore, examine and interpret the related data, with geostatistical tools, determining if these arguments are justified or not.

Methodology

To answer this question, the data available was a table with the addresses of the lotteries and its sales amount; also, a feature class of the census data related to the population race, with the percentage of non-white population. Thus, for the purposes of this project, the non-white population will represent the minorities.

The first step to analyze this data is to geocode the table with the lottery addresses, using the Geocoding Tool in Arc Map. After creating a point feature and gather the data that will be used, a geodatabase was created to guarantee reliability and organization within the dataset. The projection of the features also needed to be changed, to minimize its distortion. The choice was the Wisconsin State Plane – South Zone, where Milwaukee County is located, since the smaller the area the projection is made for, the less distortion it will have.

With this standards set up, it’s time to apply the geostatistical and symbolization tools to explore the data. For that, it’s important to present the concepts of mean center, standard center and the application of weight.
Mean Center is a parameter derived from the simple mean in a sample. However, instead of using an attribute of each entity, it uses the coordinates. Then, all the X coordinates of the sample will give the X mean, as well as with the Y coordinates. The combination of the X mean and the Y mean will provide a point feature that represents the mean center.

Standard Center can be considered a spatial version of the simple standard deviation. The mean, where the 0 standard deviation is located, in this case, will be the mean center. For this reason, there’s no negative standard deviation. A radius will be created in the area where the features locations are within the first standard deviation, considering the relation of its coordinates with the mean center coordinates.

In both situations, a different analysis can be made by adding weight to each parameter. That’s useful when your feature has important attributes that you want to explore. The quantities of the attribute you choose will give different levels of important in the calculation of the mean and the standard deviation. That’s why it’s called that the parameter will be weighted. Without the weight, each feature has equal relevance for the calculation; with the weight, each feature is treated differently, depending of the attribute chose.

At first, these parameters will be applied for the pure location of the lotteries, without considering the difference in its sales. Then, with the weight, the amount of sales of each lottery will personalize the result with a deeper interpretation. To apply all these concepts, the Spatial Statistics Tools inside Arc Toolbox is going to be used.

Also, in a more general perspective, the Z-score and probabilities will be analyzed for the whole county and three selected tracts. The Z-score represent simply the exact standard deviation of a specific feature.  The standard deviation is presented in six intervals: -3, -2, -1, +1, +2 and +3. By using the Z-score, it’s possible to determine the exact position of a feature, for example, between 0 and +1.

Each Z-score has a representation within the probability, allowing the analysis of the chances of one attribute be higher or lower than a number of your choice in a year, for example. For that, a standard statistical table is used, where you can determine the probability of a given z-score. The inverse process can also be made, when you have the probability and needs to find the corresponding z-score.

However, by using the table, some approximations are made, since there are not all the possible z-scores and probabilities. Because of that, to guarantee precision of the results in the project, the NORM.INV function of Microsoft Excel is used to provide the exact result, without approximations.

For last, the symbolization in Arc Map is used to visualize and classify the data in a way to be easily interpreted. Proportional and choropleth maps are used to support the results and reach to a conclusion about the spatial question.

Results

Firstly, it’s necessary to have a preliminary perspective of the characteristics of Milwaukee County. In the Map 1 is possible to notice a non-white population concentration in the north-western region of Milwaukee, and it’s extremely high concentrated in the area outlined in red.

Thus, the analysis will be focused in this area, interpreting if the results of high sales are being placed there or close to there. The symbolization of the percentage of non-white people will be the background of most of the other maps. That will help to analyze the sells results, but a transparency was applied to avoid the maps to be cluttered.

Outlined in pink are the tracts selected in this exercise to have the Z-score calculated. In the northern tract, the z-score of 2.26 shows that this county has a high amount of sales, far from the mean. The situation is more extreme in the eastern tract, where the z-score is 7.95, which can be even considered an outlier, because the sales are extremely high. In the western tract, the situation can be considered normal, since the z-score is 0.39, extremely close to the mean. Also answering the exercise questions, by using the formula in Microsoft Excel (Figure 1), for the entire county, in 70% of the time the lottery sales will exceed U$91,504. However, in 20% of the time, the lottery sales will exceed U$628,122.

Figure 1 – Use of Excel to find exact results.

Map 1- Non white distribution in Milwaukee

In the map 2, it’s possible to perceive how the lottery sales are concentrated in the south, with the exception of some tracts in the north-east and in the center. However, it’s important also to consider the size of each tract. The tracts further from the center are larger, so it’s natural that the sales are higher. In the other hand, none of these high-sales tracts coincides with high concentration of non-white population. Thus, apparently, based only in these two maps, looks like the sales are concentrated in tracts with a predominant white population.

Map 2 – Sales Distribution in Milwaukee\

With the analysis of the mean centers in 2007 and 2009, the map 3 and 4 shows that when the weight is applied to the mean center, it shifts to the south, where there’s no concentration of non-white population.

Map 3 – Mean Centers in 2007

Map 4 – Mean Centers in 2009



When applying the standard distance in both years (Map 5 and Map 6), there’s no big temporal difference. In both occasions, the standard distance covers a portion of the “non-white area”, but also a big part of “white area”. Thus, it doesn’t seem to mean a necessary concentration either in white or non-white regions.

Map 5 – Sales Distribution in 2007

Map 6 – Sales Distribution in 2009

However, when analyzing the map 7, comparing both years in a single maps, a slightly shift is noticed: in 2009 the standard distance is a little more in the north-east than in 2007.

Map 7 – Comparison between 2007 and 2009

The reason for that can be noticed when looking closer to comparison between the mean centers of 2007 and 2009 (Map 8). Both normal and weighted mean centers have shifted to north, in direction of the area with high non-white population is noticed, in the period of two years. It’s a small shift, it doesn’t change even the tract where the mean center is located, however, it shows an important result about the variation of time.

Map 8 – Comparison of Mean Centers (2007-2009)


Conclusion

Considering the first results, apparently there’s no discrimination of race related to the lottery sales. The high amount of sales does not fall in tracts with non-white concentration. By simply seeing the mean center, it falls within the center of the county itself, so it also doesn’t look to have a considerable difference. Actually, the opposite idea is found when analyzing the mean center weighted by the amount of sales: it shifts to the south, where the concentration of white population is higher. Everything leads to think that there’s no reason for the allegations of the civil rights groups.

However, when analyzing the temporal dimension, it’s noticed that there’s a shift in direction to the area where the non-white concentration is higher. Then, it’s possible to affirm that in until 2009 there was no big difference in the amount of sales depending on the race predominant in a given region. However, the results lead us to think that in the future, the situation can change, because there’s a tendency of shifting the mean center to north – where the non-white population is concentrated.

The studies about this matter should be expanded to recent years, in favor to analyze how the temporal change is occurring. Also, it’s important to understand the limitations of the analysis, since some factors were not considered. The absolute amount of sales can also be related with the size of the tracts and the total population living in it. Also, the population that lives in a given county doesn’t necessary buy lottery tickets only within its tract. This is true especially if it’s an urban area, where the mobility of people is higher. Therefore, the results of this project show that by now and with this data there’s no reason for the allegations of the Civil Rights Groups, however, that doesn’t mean that the matter should be ignored. Contrarily, this project encourages more studies to be made not only with the same concepts, but also including other variables.

No comments:

Post a Comment