# Formula Questions XRF instruments to answer these Qs: Chapter 9 Exercise 9.7 Exercise 9.8 Exercise 9.12 Chapter 10 Look up information from the vendor

Formula Questions XRF instruments to answer these Qs:

Chapter 9

Exercise 9.7

Exercise 9.8

Exercise 9.12

Chapter 10

Look up information from the vendor on the following XRF instruments;

Rigaku ZSX Primus IV,

Rigaku NEX DE,

Bruker S1 Titan.

Determine what elements will it detect and give one example of an application from the vendor. Notes on Statistics

By S.E. Van Bramer

March 8, 1995

1.

Terminology

a.

Indeterminate (random) error: evaluate with statistics

b.

Determinate (systematic) error: evaluate with reference standards.

c.

Gross error: big mistake, like spilling everything on the floor.

d.

One sided probability: Use one sided probability if comparing the size or

magnitude from two different data sets (ie. a is larger than b).

e.

Two sided probability: Use two sided probability if comparing two different

data sets for a difference (ie. a is different than b).

f.

Population: This refers to a set of all possible measurements. This is an ideal

that can only be approached. Greek letters are used to symbolize population

statistics

g.

Sample: This refers to a set of actual measurements. The distinction between

sample and population statistics is most important for a small number of

measurements (less than 20).

h.

t-Test: This is one of the most powerful and widely used statistical tests. The ttest (Student’s t) is used to calculate the confidence intervals of a measurement

when the population standard deviation ()) is not know. Which is usually the

case. The t-test is also used to compare two averages. The t-test corrects for the

uncertainty of the sample standard deviation (s) caused by taking a small number

of samples.

i.

Detection Limit

i.

Action Limit; Lc 2), 97.7% certain that signal observed is not random

noise.

ii.

Detection Limit; LD 3), 93.3% certain to detect signal above the 2) action

limit when the analyte is at this concentration.

iii.

Quantitation Limit; LQ 10), Signal required for 10% RSD.

iv.

Type I Error; Type I error is identification of random noise as signal.

v.

Type II Error; Type II error is not identifying signal that is present.

2.

Descriptive Statistics. These statistics are used to describe a population or a

sample.

a.

Population Mean (µ) and Sample Average (x, or x)

N

µ

M

i 1

b.

N

xi

OR

N

x

M

i 1

xi

N

Standard Deviation: measurement of the spread in individual data points to

reflect the uncertainty of a single measurement.

i.

Population standard deviation ()). For large sample sets (usually more

than 20 measurements) or when the population mean (µ) is known.

N

)

ii.

2

M (xi µ)

i 1

N

Sample standard deviation (s). For small sample sets (usually less than 20

measurements) when the sample average (x) is used.

N

2

M (xi x )

s

iii.

i 1

N 1

Pooled standard deviation (spooled ). When several small sets have the

same sources of indeterminate error (ie: the same type of measurement but

different samples) the standard deviations of the individual data sets may

be pooled to more accurately determine the standard deviation of the

analysis method.

N1

M (xi

spooled

i 1

N2

x1) M (xj

2

j 1

N1 N2

2

x2)2

c.

Standard Error of the Mean ()m). The standard error of the mean is the

uncertainty in the average. This is different from the standard deviation ()),

which is the variation for each individual measurement. Notice that when N is 1

(a single measurement) )m = ).

i.

If ) is known, the uncertainty in the mean is:

)

uncertainty ()m)

ii.

If ) is unknown, use the t-score to compensate for the uncertainty in s.

The value for t is obtained from a table for appropriate % confidence level

and for N-1 degrees of freedom. (N-1 because one degree of freedom is

used to calculate the mean.) Since the uncertainty is a range that could be

greater or less than the mean, a two-sided value should be used for t.

uncertainty (sm)

d.

N

t

s

N

z-Score. Normalizes data points so that the average is 0 and the standard

deviation is 1. The cumulative normal distribution (z-score) shows what

percentage of a normal distribution is bounded by a given value for z. One sided

distributions, the distribution is the area from – and z. Two sided values give the

area between ± z. To illustrate this:

i.

For a 1 sided distribution 97.72% of all data points will be less than 2

standard deviations above the average.

ii.

For a 2 sided distribution 68.28% of all data points will be between ± 1

standard deviation from the average.

z

(x i

µ)

)

Table 1. Cumulative Normal Distribution. The area under

a gaussian distribution where z is the population standard

deviation ()).

z

p1 sided

p2 sided

0

.500

0.00

1

.8414

.6828

2

.9772

.9544

3

.9986

.9876

e.

Confidence Interval. The confidence interval is the preferred method for

describing the range of uncertainty in a value. The confidence interval is

expressed as a range of uncertainties at a stated percent confidence. This percent

confidence reflects the percent certainty that the value is within the stated range.

i.

If the population standard deviation ()) is known. The standard error of

the mean ()m) combined with the z-score (from a table for the desired

Confidence Level) is used to express the uncertainty in the mean as a

range. This is the confidence interval at the stated certainty. The

percentage used should always be stated. This method is widely used to

report results with a percent certainty and is expressed as follows:

x ± z

)m

OR

x ±

)

z

N

Table 2. Values of z for given Confidence Level.

Confidence Level (%)

z (2 sided)

0.67

z (1 sided)

0.0

ii.

50

1.000

0.407

68

1.645

1.282

90

1.960

1.645

95

2.576

2.326

99

3.29

3.08

99.9

If the population standard deviation ()) is unknown, the sample standard

deviation (s) may be used to estimate the confidence interval. This is the

preferred method for reporting the uncertainty in experimental results. It

takes into account the number of measurements made, the variance in the

measurements, and expresses the range at the stated percent confidence

level.

x ±

(t

s)

N

iii.

Based upon the confidence interval calculated above, an experimental

result should be expressed as:

5.3 ± 1.2 at the 95% confidence level

NOTE: Values for Student’s t are given in Table 3.

3.

Comparison Tests. These tests are used to compare averages to determine if there is

a significant difference between two values.

a.

Comparing the sample to the true value. Method #1. The t-test is used to

determine if there is a significant difference between an experimental average and

the population mean (µ) or “true value”. This method is used to compare

experimental results to quality control standards and standard reference materials.

This comparison is based upon the confidence interval for the sample mean

calculated above. If the difference between the measured value and the true value

is greater than the uncertainty in the measurement, there is a significant difference

between the two values at that confidence level. This may be expressed

mathematically that IF:

x

µ t

s

N

Then there is no significant difference at the stated confidence level. This could

be stated as “there is no significant difference between the experimental results

and the accepted value for the Standard Reference Materials at the 95%

confidence interval.”

b.

Comparing the sample to the true value, Method #2. This is same test as

above, but it is often easier to understand the meaning of the test by calculating an

experimental value for t (t experimental). Then the experimental t-score (texperimental ) is

compared to t-critical (t c), the value of t found in a table. texperimental is calculated

as follows:

texperimental

(x µ)

s

N

There is a significant difference between the sample average and the true value if

texperimental is greater than tc. tc is chosen for N-1 degrees of freedom at the desired

percent confidence interval. If the experimental value may be greater or less than

the true value, use a two sided t-score. If specifically testing for a significant

increase or decrease (but not both) use a single sided value for t c.

c.

Comparing two experimental averages. The t-test may also be used to compare

two experimental averages. This is most accurately done by using the pooled

standard deviation and calculating t experimental as:

x1 x2

texperimental

spooled

1

1

N1

N2

If texperimental is greater than tcritical then there is a significant difference between the

two means. tcritical is determined at the appropriate confidence level from a table of

the t-statistic for N 1 + N2 – 2 degrees of freedom.

Table 3. tc for Normally Distributed Data.

4.

P1 sided

P2 sided

df

1

2

3

4

5

6

7

8

9

10

t.60

t.20

t.70

t.40

t.80

t.60

t.90

t.80

t.95

t.90

t.975

t.95

t.99

t.98

t.995

t.99

.325

.289

.277

.271

.267

.265

.263

.262

.261

.260

.727

.617

.584

.569

.559

.553

.549

.546

.543

.542

1.376

1.061

.978

.941

.920

.920

.896

.889

.883

.879

3.078

1.886

1.638

1.533

1.476

1.440

1.415

1.397

1.383

1.372

6.314

2.920

2.353

2.132

2.015

1.943

1.895

1.860

1.833

1.812

12.71

4.303

3.182

2.776

2.571

2.447

2.365

2.306

2.262

2.228

31.82

6.965

4.541

3.747

3.365

3.143

2.998

2.896

2.821

2.764

63.66

9.925

5.841

4.607

4.032

3.707

3.499

3.355

3.250

3.169

20

.257

.533

.860

1.325 1.725 2.086 2.528 2.845

.283

.524

.842

1.282 1.645 1.960 2.326 2.576

Q-test. Use to identify statistical outliers in data. This test should be applied sparingly

and never more than once to a single data set.

Qn

xb

xa

R

R is the range of all data points, xa is the suspected outlier, and xb is the data point closest

to xa. At the 90% confidence interval Q for N replicate measurements is:

Table 4 Q test decision level at 90% confidence interval

N

Q

3

.94

4

.76

5

.64

6

.56

7

.51

8

.47

9

.44

10

.41

5.

Linear Regression. Fit the line y = mx + b to linear data where x is the dependent

variable, y is the independent variable, and x i is the i’th data point, and N different

standards are used. yave is the average of the y values for the standards, and xave is the

average of the x values for the standards. This method assumes that there is no variance

in the value for x.

sxy

ssxy

syy

sxx

(x

ssy

xave)(y

(y

ssx

(x

yave)

yi

xave)2

or

xi

N

( yi)2

2

or

N

( x i )2

2

N

sxy

m

b

x i yi

x iyi

yave)2

Slope

Intercept

OR

sxx

yave

(m

xave)

Assuming linear function and no replicates, the standard deviation about the

regression is:

syy

sr

(m 2

(N

sxx)

2)

Uncertainty in ypredicted:

sy

sr

1

N

1

(x

xave)2

sxx

Uncertainty in xpredicted for an unknown with an average signal yunk from M

replicates:

sx

sr

m

1

M

1

N

yave)2

(yunk

sxx

6.

Error Analysis. The following techniques are used to determine how error

propagates through an experimental procedure. This method is based upon combining

the uncertainty for each step.

Table 5. Error Propagation.

Calculation

Addition and Subtraction

Multiplication and Division

Exponentiation

Logarithm

Antilog

Example

x=a+b-c

x = a*b/c

x = ab

x = log10 a

x = ln a

x = antilog10 a

x = ea

Standard Deviation

sx = (sa2 + sb2 + sc2 …)½ .

sx = x * [(sa/a)2 + (sb/b)2 + (sc/c)2]½

sx = x * b * (sa/a) (no uncertainty in b)

sx = 0.434 * (sa/a)

sx = sa/a

sx = 2.303 * x * sa

sx = x * sa

EXAMPLE:

a.

From a calibration curve the concentration of an unknown is 16±2 ppm

7.

b.

The solution was prepared by:

i.

dissolving 0.0452±0.0001 g of compound

ii.

in 250.0±0.1 ml of water

c.

From this the weight of unknown in the compound is:

x = 16 ppm * 250 ml * (mg/l) * (1 g/1000 mg) * (1 l/1000 ml)

= 0.0040 g of unknown in the compound

d.

The uncertainty in this weight is:

sx = x * [(sa/a)2 + (sb/b)2 + (sc/c)2]½

= 0.0040 * [(2/16)2 + (0.1/250)2]½

= 0.0040 * (0.0156 + 0.0004)½ (NOTE: concentration uncertainty is limiting)

= 0.0005

e.

Therefor the weight of the unknown is:

0.0040±0.0005 g (or 4.0±0.5 mg)

References

a.

b.

c.

d.

Howard, M.; Workman, J. Statistics in Spectroscopy; Academic: Boston, 1991.

Box, G.; Hunter, W.; Hunter, J. Statistics for Experimenters; Wiley: New York:

1978.

Akhnazarova, S.; Kafarov, V. Experiment Optinization in Chemistry and

Chemical Engineering; MIR: Moscow, 1982.

Skoog, D.; West, D.; Holler, F. Analtyical Chemistry, An Introduction; Saunders:

Philadelphia, 1994.

Statistics Problem Set #1

ENVR 303, 1995. Dr. Van Bramer

1.

2.

3.

Use the following data sets for the calculations in this problem set. These data have a random normal

distribution. These are all concentrations (ppb) for replicate samples:

a.

Set #1, a large sample set: 25.160, 25.227, 24.402, 23.924, 20.730, 23.615, 23.648, 23.747,

23.613, 22.910, 25.075, 24.301, 24.611, 25.133, 24.152, 24.196, 24.775, 23.841, 24.883,

25.561

b.

Set #2, a small sample set: 22.143, 22.640, 24.084, 23.135, 24.967

c.

A pair of small data sets

i.

Set #3; 11.892, 10.491, 10.172, 12.480, 11.095

ii.

Set #4; 17.656, 16.874, 17.999, 17.825, 19.525, 17.712

Descriptive Statistics.

a.

Determine the average and standard deviation for each data set.

b.

Determine the standard error of the mean for each data set.

c.

Determine the following confidence intervals (2 sided) for each data set.

i.

50%

ii.

68%

iii.

90%

iv.

95%

v.

99%

vi.

99.9%

Comparison Tests.

a.

For the Set #1 and Set#2; compare the means, and determine if there is a significant

difference between the two sets at the 90, 95, and 99% Confidence Interval.

b.

For Set #3 and Set #4; Compare the means, and determine if there is a significant

difference between the two sets at the 90, 95, and 99% Confidence Interval.

c.

For the experiments with Set #3, and Set #4, I am really interested in finding if Set #4 is

significantly larger than Set #3. Determine if this is true at the 90, 95, and 99%

Confidence Interval. Think about what this means in terms of the t-score.

d.

The true values for the data sets are:

i.

Set #1 24.0

ii.

Set #2 24.0

iii.

Set #3 12.0

iv.

Set #4 17.5

Determine if there is a significant difference between the true value and the measured

value for each set at the 80, 90, 95, and 99% Confidence Interval.

4.

z-Score. For set #1, normalize the data by z-scoring.

5.

Rejection of outlayers. Use the Q-test to determine if 19.525 is an outlier in the second small data

set.

Statistics Problem Set #2

ENVR 303, 1995. Dr. Van Bramer

1.

Given the following data for mercury concentration in a soil sample, construct a

calibration curve, using linear regression determine the unknown concentration. The

sample was prepared from a 1.3452 g of a sample (weighed on an analytical balance).

Extracting the mercury, and diluting the sample to 100.0 mL in a class A volumetric

flask.

a.

Based upon propogation of error, what is the concentration of mercury in the soil

sample?

b.

How would you verify the accuracy of this determination.

c.

Based on the replicat samples, what are the different LOD’s. How many replicate

samples would be required to measure a 0.001 ppm sample with 10% RSD?

Concentration

(ppm)

0

0.001

0.002

0.005

0.01

0.05

0.092

0.124

unknown

2.

Signal

LOG(Po/P)

0.02599

0.03544

0.03447

0.05885

0.04349

0.23143, 0.22492, 0.22656, 0.27000

0.41289

0.55200

0.21535

A unknown sample was prepared for analysis by inductively coupled plasma emission.

Solution A; 0.0153±0.0001 g of unknown was weighed out on an analytical balance,

dissolved and diluted to 250.0±0.1 ml in a volumetric flask. Solution B; 100.0±0.2 ml of

solution A was transferred to a second flask and diluted to 250.0±0.1 ml. Solution C;

0.0021±0.0001 g of iron was dissolved, 100.0±0.2 ml of solution A was added, and the

solution diluted to 250 ml. Giving the following data, what is the weight percent of Fe in

the unknown sample? Based upon the replicate samples and propagation of error what is

the uncertainty in this value? Report your results as ± 90% confidence interval using the

t-test. (10 points)

Blank

Solution B

Solution C

Trial 1

0.788847

8.799929

21.71420

Trial 2

0.835551

8.880910

21.92304

Trial 3

0.996183

8.850854

21.92695

S.E. Van Bramer

March 16, 1995

ENVR 303

Problem Set #1, discussion

I have already posted some of the answers to the statistics problem set, but I also want you to

spend some time thinking about what those answers mean. In an effor to address that, I am

writing this document.

1.

Data sets: What you see here is just a list of numbers. What do these numbers

correspond to. What are they a measure of. When you apply statistical tests to your own

data, you need to think about this. This is important, because it determins what you

measure the uncertainty of. For example, lets say that set #1 is measurement of lead in

water:

a.

If the set represents samples from 20 different locations in a lake, then you are

measuring the variability in the lake at the time you took the samples.

b.

If the set represents samples of tap water taken on 20 consecutive days, then you

are measuring the day to day variablity of the tap water.

c.

If the set represents samples of tap water taken one right after the next, then you

are measuring the variability of the tap water over this time frame.

d.

If you took a single water sample and are measuring that sample 20 times, then

you are measuring the variablity of the analysis.

All of these measurements represent very different types of information. I want to make

two points. First, think carefully about what it is that you want to find out and design

your experiment accordingly. Each of the above experiments provides distinctly different

information. What are you interested in? Second, when you describe your experiment to

someone else be careful about what you say. Say what you did and what you measured.

Do not make a claim for a type of information that you did not obtain. ie: from

experiment d above, you can say that the concentration of lead in your sample was x +/- s.

but you do not know anything about how much that sample will vary from one taken a

month later at the same place.

2.

Descriptive Statistics: These are just statistical tools that you use to represent your data

set. It is a shorthand to make life easier. Instead of having to list off all twenty values

from set #1, you can describe the set with statistics. When you do this, just be certain to

tell exactly which statistics you use so someone else can tell what you did.

3.

a.

POOR, Set #1 is 24.175 ± 1.066.

b.

Better, Set #1 is 24.175 ± 1.066 (sample standard deviation from 20 replicat

measurements)

c.

Best, The average of Set #1 is 24.175 ± 0.467 at the 95% confidence interval.

Comparison Tests: These statistical tools are used to compare two things to see if there

is a significant difference. You can not tell if two sets are the same, only if they are

“significantly different”. Some examples of what is found

a.

Comparing set #1 to the “true value”. t experimental = 0.735. Then compare this to t c

from the table. Since you are comparing for a difference, this is a two sided test.

You have 20 samples, so you have 19 degrees of freedom. Looking in the table

that I gave you, the closest that you can find is 20 degrees of freedom (pretty

close, if you want to do better interpolate between the values given, or go to a

bigger table in a different source). From the table, t c = 1.64. From this you can

say “Based upon this experiment there is no significant difference between Set #1

and the accepted value at the 90% confidence interval”.

b.

Comparing Set #3 and Set #4. Now you are comparing two experimental values.

First, you can improve your estimate of the population standard deviation by

pooling the standard deviation from each data set. Then you may used this pooled

standard deviation to determine t experimental = 12.146. From the table, find t c (for N1

+ N2 – 2 degrees of freedom) = 2.262 (95% CI, 2 sided). Based upon this you can

say “There is a significant difference between Set #3 and Set #4 at the 95%

confidence interval”

c.

Alternatively, if you expect Set # 4 to be larger, you may use a 1 sided t c = 1.833

(95% CI, 1 sided). Based upon this you can say “Set #4 is significantly larger than

Set #3 at the 95% confidence interval.”

4.

z-Score: Just to review from class, using Set #3 as an example, what I was looking for

here could be shown as:

Sample

z-score

11.892 0.6939

10.491 -0.7658

10.172 -1.0982

12.480 1.3066

11.095 -0.1365

Avg 11.226 0…

Purchase answer to see full

attachment