Intro to Bayesian Linear Mixed Effect Models with rstanarm

Apr 4, 2017 15 min read R

Overview of Presentation

Bayesian vs. frequentist inference
Bayes’ Theorem
Example of Bayesian inference
Bayesian LMERs with rstanarm
how to code models
selecting priors
displaying & interpreting the posterior distribution
model diagnostics and comparisons

Bayesian vs. Frequentist Inference

Frequentist Inference

Uses only the data and compares it to an idealized model to make inferences about the data
Example Problem: You lose your cellphone in your house. You have a friend call it and you listen for the sound to find it. Where do you search in the house?
Frequentist solution: You can hear the phone ringing. You also have a mental model of the house that helps you identify where the sound is coming from. When you hear the phone ringing, you infer the location with the help of your model.

Bayesian Inference

Draws conclusions from data and all prior knowledge about the data including idealized models
Uses prior knowledge to make bets on the outcome
Bayesian solution: You can hear the phone ringing but you also know where you were today in the house and where you tend to lose things. You also have a mental model of the house. You combine the inferences from the ringing with your prior knowledge to find the phone

Bayes’ Theorem

\(P(A\mid B) = \frac{P(B \mid A) \, P(A)}{P(B)}\)
The probability of observing event A given that B is true equals the probability of observing event B given the A is true times the probability of observing event A divided by the probability of event B
Can be reformulated as Bayes’ Rule:
Prior Odds * Relative Likelihoods = Posterior Odds

Bayesian Terms

Prior odds = our beliefs about the events prior to looking at the new data
Relative likelihoods = how likely an observation is when comparing one hypothesis to another
Posterior odds = our state of belief after comparing the prior information to the likelihoods (and updating with new data)

What they look like

Applying Bayes’ Theorem: Example Problem

You are treating students with Diseasitis
20% of students will have Diseasitis right now based on previous research
You have a test that turns black if a student has Diseasitis
If a student has Diseasitis, only 90% cause the test to turn black
The test also turns black 30% of the time for healthy students
A student comes in to take the test and it turns black. What is the probability they have Diseasitis?

Solution

We know
The probability of black given sick = 0.9
The probability of black given not sick = 0.3
The probability of sick = 0.2
Prior odds X Relative likelihoods = Posterior odds
Relative likelihoods compare the probabilities of the two observed outcomes (prob of black = sick and prob of black = healthy)
\(\frac{P(sick)}{P(healthy)} \times \frac{P(black\mid sick)}{P(black\mid healthy)} = \frac{P(sick\mid black)}{P(healthy\mid black)}\)
\((20\% : 80\%) \times (90\%:30\%) = \frac{3}{4}\) odds or \(\frac{3}{7} =43\%\) probability of sick given blackened test
Remember: odds \((A:B)\) = probability \(\frac{A}{A+B}\)

Solution (con’t)

Another way to look at it:
Imagine a waterfall with two different colored waters
At the top of the waterfall, 20 gallons/second of red water are flowing down, and
80 gallons/second of blue water are coming down.
90% of the red water makes it to the bottom.
30% of the blue water makes it to the bottom.
Of the purplish water that makes it to the bottom of the pool, how much was originally from the red stream and how much was originally from the blue stream?

Solution visualized

Applying Bayesian inference to LMERs

Applying Bayesian Inference

Bayesian inference can be applied to most models (ANOVA, linear regression, GLMERs, LMERs, GAMMs, CLMMs, etc.)
Bayesian inference does not give p-values (more on this later)
It can run models that otherwise don’t converge with frequentist models
Easier to intrepret
Provide distribution of values for each parameter rather than a single output

Applying Bayesian Inference

Real life language data is much more complex than the earlier examples
We can’t directly calculate the posterior distribution
Instead we have to approximate accurate posterior distributions
The rstanarm package uses the Stan language to generate a posterior distribution via Markov chain Monte Carlo sampling
This process takes time (sometimes a long time!) and requires thousands of samples
Stan is much more efficient than BUGS or JAGS (other Bayesian samplers) and is in active development

rstanarm

Use library(rstanarm)
stan_lmer() or stan_glmer()
Similar syntax to lme4 but have to specify
priors
number of chains
iterations
adapt_delta (optional)

Selecting priors

No matter what we research, we already know something about the data we will collect before we do
For phonetics, we know range of sounds human can produce and hear
We know that it is very unlikely to get meaningful speech sounds that are 100,000 Hz
VOTs are typical not seconds long or conversely not extremely short like 1 ns
We pick priors that are weakly informative but still helpful
We want to make some bets but not bet too strongly to overly influence the results
We pick priors that are distributions (typically normal) that have a reasonable mean and standard deviation but not too narrow nor too wide

Selecting priors

The priors tell the model where to start guessing at the data and how strong of a bet to make
If we have lots of data that strongly diverge from the priors, then the results won’t reflect the priors as much, but if the priors are too strong then they can heavily influence our results

VOT example

We need to pick priors for the intercept and slopes for VOT data
What do we know about VOT that are measured in ms?
We can pick a prior with a normal distribution centered at 0ms with a standard deviation of 50ms: normal(0,50)
that says we are 68% sure the mean is between -50ms and 50ms, 95% sure between -100ms and 100ms and 99.7% sure between -150ms and 150ms
We can tweak it as needed
What if we said normal(0,5)? Is that too strong? Or normal(0,500)? Is that too weak?

Sampling the posterior distribution

We need to specify how many samples we need in addition to the number of chains
Stan is quite efficient so we can use a relatively small amount depending on how nice the data are
Typically use chains = 4 (number of unique samplers) and iter = 2000
The sampling process requires warmup samples to get a taste of the distribution, which we throw out since not very good (warmup = 1000)
Sometimes there are divergent samples which are bad so we add adapt_delta=0.999 (if still divergent keep adding 9s until it works)

Let’s make some models!

These can take a bit of time to run

data_vot <- read_csv("t_d_vot.csv") # import data
options(mc.cores=parallel::detectCores ()) # Run on multiple cores
# strong priors
model1 <-stan_lmer(vot*1000 ~ consonant2 + (1+consonant2|speaker_id) + (1+consonant2|word),
                   data=data_vot,
                   prior_intercept = normal(0, 5,autoscale = F),
                   prior = normal(0, 5,autoscale = F), 
                   prior_covariance = decov(regularization = 2),
                   chains = 4,
                   iter = 2000, adapt_delta=0.999,warmup=1000)

More models

# weaker priors
model2 <-stan_lmer(vot*1000 ~ consonant2 + (1+consonant2|speaker_id) + (1+consonant2|word),
                   data=data_vot,
                   prior_intercept = normal(0, 50,autoscale = F),
                   prior = normal(0, 50,autoscale = F), 
                   prior_covariance = decov(regularization = 2),
                   chains = 4,
                   iter = 2000, adapt_delta=0.999,warmup=1000)

One last model

# very weak priors
model3 <-stan_lmer(vot*1000 ~ consonant2 + (1+consonant2|speaker_id) + (1+consonant2|word),
                   data=data_vot,
                   prior_intercept = normal(0, 500,autoscale = F),
                   prior = normal(0, 500,autoscale = F), 
                   prior_covariance = decov(regularization = 2),
                   chains = 4,
                   iter = 2000, adapt_delta=0.999,warmup=1000)

Interpreting Results

What are p-values?

From http://www.labstats.net/articles/pvalue.html
The probability of getting the results you did (or more extreme results) given that the null hypothesis is true.
P = 0.05 does not mean there is only a 5% chance that the null hypothesis is true.
P = 0.05 does not mean there is a 5% chance of a Type I error (i.e. false positive).
P = 0.05 does not mean there is a 95% chance that the results would replicate if the study were repeated.
P > 0.05 does not mean there is no difference between groups.
P < 0.05 does not mean you have proved your experimental hypothesis.

Intrepreting p-values

If you are studying the difference in VOT between two stops and the p-value is 0.03 that means:
if there were no difference between the VOT of the two stops, you’d get the observed difference or more in 3% of the studies due to random sampling error.
It does not give us any information about the alternate hypothesis that there is a difference between the stops’ VOTs.

Bayesian Approach

No p-values!
Use Bayesian Highest Density Interval (HDI)
Area of posterior distribution with 95% highest probability [2.5% to 97.5% of posterior distribution]
Bayesian approach answers “The probability of the Null Hypothesis being true given the data” (Frequentist is “The probability of observing the data or anything more extreme given the null hypothesis”)
Bayesian approach is more inuitive
Multiple comparisons also aren’t an issue!

## Intrepreting HDI

If there is no overlap of HDIs, then probable difference
If overlap can say there is probable similarity but also probable difference (can now make these claims rather than just binary answers!)
Much more nuanced and like real life

Model 1 Results

	mean	mcse	sd	2.5%	25%	50%	75%	97.5%	n_eff	Rhat
(Intercept)	4.0864961	0.1266263	5.3324124	-6.4999536	0.3921561	4.2270555	7.876507	14.26621	1773	1.0002057
consonant2t̻	1.8167415	0.1058488	4.9799389	-8.2129145	-1.4335593	1.8672895	5.128587	11.62017	2213	1.0018490
b[(Intercept) speaker_id:F01]	6.1571967	0.3203016	6.6888620	-5.5604391	1.1397537	5.8466621	10.632572	19.77066	436	1.0066978
b[consonant2t̻ speaker_id:F01]	9.2544098	0.8048084	13.0114306	-12.3259024	-1.6638371	8.2141599	20.560041	32.78133	261	1.0114576
b[(Intercept) speaker_id:F02]	-0.1790354	0.3008634	6.5528716	-11.9120906	-4.7962143	-0.5556100	4.051550	13.67538	474	1.0046269
b[consonant2t̻ speaker_id:F02]	21.2263520	0.8436341	13.4194616	-0.6464893	9.8916638	19.9318661	32.997877	44.63413	253	1.0122115
b[(Intercept) speaker_id:M01]	20.8152190	0.3496184	7.1102001	8.1548843	15.6766691	20.6613122	25.518407	35.08725	414	1.0086930
b[consonant2t̻ speaker_id:M01]	3.1611167	0.7882159	12.9473913	-19.1680871	-7.3671564	1.7137774	14.332284	25.81529	270	1.0095255
b[(Intercept) speaker_id:M02]	3.8957675	0.3192612	6.7343050	-7.9658212	-0.9439335	3.3301208	8.467740	17.98618	445	1.0066537
b[consonant2t̻ speaker_id:M02]	6.6532391	0.8065003	12.9791396	-14.9735835	-4.1659549	4.9942416	17.983205	29.81430	259	1.0111424
b[(Intercept) speaker_id:M03]	8.3188733	0.3107234	6.7029312	-3.5148628	3.5766378	7.8081863	12.810630	22.36286	465	1.0058295
b[consonant2t̻ speaker_id:M03]	30.0009369	0.8610039	13.8191725	7.5520796	18.1920717	28.9177002	42.115818	54.17208	258	1.0126060
b[(Intercept) word:ahdan]	1.4513173	0.1948404	5.2998150	-7.2890099	-1.2174846	0.1475008	3.250404	14.80495	740	1.0052302
b[consonant2t̻ word:ahdan]	0.4205744	0.2205275	13.9473834	-30.6421033	-3.4534170	0.0435104	4.419632	31.09333	4000	0.9995036
b[(Intercept) word:ata]	5.9983941	0.4610827	11.1459093	-7.4882486	-0.3187069	1.6672417	10.255426	35.63936	584	1.0064080
b[consonant2t̻ word:ata]	10.5945488	0.6598986	13.4471019	-4.8164844	0.0128475	5.6905976	19.930322	41.17283	415	1.0053769
b[(Intercept) word:idi]	3.1229898	0.2239212	5.5562291	-5.1150601	-0.1005295	1.3810099	5.539898	17.20763	616	1.0051051
b[consonant2t̻ word:idi]	0.9119649	0.2231316	13.9709089	-29.2847326	-3.0776975	0.0763752	4.750007	32.42595	3920	1.0000484
b[(Intercept) word:itik]	7.0288940	0.4652301	11.2837918	-6.8164340	0.0649321	2.6536150	11.458345	36.75228	588	1.0043226
b[consonant2t̻ word:itik]	12.4591581	0.7015108	13.8732923	-3.1514511	0.6738726	7.9105136	22.269547	43.61696	391	1.0066493
sigma	6.3583149	0.0131140	0.6890298	5.1838229	5.8768264	6.2991658	6.749955	7.93253	2761	0.9994410
Sigma[speaker_id:(Intercept),(Intercept)]	196.8932077	6.2667480	181.8820568	27.6287381	77.7356439	139.5448053	251.700022	715.22591	842	1.0038181
Sigma[speaker_id:consonant2t̻,(Intercept)]	48.7473253	5.8044110	136.4071223	-148.4347459	-25.7137070	15.9920358	98.179490	403.26124	552	1.0049717
Sigma[speaker_id:consonant2t̻,consonant2t̻]	422.3109484	20.5367610	424.3029914	48.7769109	133.6042056	281.1530738	567.852549	1577.86463	427	1.0067827
Sigma[word:(Intercept),(Intercept)]	114.0889282	8.9331102	216.6859959	0.0340652	3.2356146	25.8367294	130.758542	747.66508	588	1.0048817
Sigma[word:consonant2t̻,(Intercept)]	16.6823899	2.0121354	95.1906538	-146.4042270	-1.7539351	0.6144948	29.869396	229.08762	2238	1.0003819
Sigma[word:consonant2t̻,consonant2t̻]	221.8939850	14.5030489	360.6208258	0.0355739	5.2610660	79.1953082	297.683633	1220.99980	618	1.0069617
mean_PPD	30.6615249	0.0183833	1.1626600	28.4549208	29.8770106	30.6501532	31.441122	32.93101	4000	0.9998124
log-posterior	-239.0927218	0.1498344	4.9706903	-249.6491694	-242.1890713	-238.6483981	-235.576336	-230.67157	1101	1.0002455

Model 2 Results

	mean	mcse	sd	2.5%	25%	50%	75%	97.5%	n_eff	Rhat
(Intercept)	14.2244977	0.1336854	5.2994262	3.5171636	11.2271482	14.2037311	17.2310756	24.692964	1571	1.0023898
consonant2t̻	30.8076631	0.1740300	6.9720050	16.3760843	26.7012549	30.9575055	34.8718284	45.069972	1605	0.9998558
b[(Intercept) speaker_id:F01]	-1.5847485	0.1321248	5.2930367	-12.2908050	-4.6494473	-1.4603011	1.5166600	8.785738	1605	1.0024359
b[consonant2t̻ speaker_id:F01]	-3.9668643	0.1436126	6.2875952	-16.7081135	-7.8130802	-4.0096678	-0.1371577	9.281690	1917	0.9994917
b[(Intercept) speaker_id:F02]	-7.7123341	0.1279961	5.3077925	-18.6121020	-10.7970832	-7.6386131	-4.4249411	2.598526	1720	1.0024501
b[consonant2t̻ speaker_id:F02]	7.6285262	0.1471050	6.4480219	-4.7718606	3.5596735	7.5109740	11.6611606	20.522699	1921	0.9994883
b[(Intercept) speaker_id:M01]	12.8008224	0.1310955	5.3713873	2.5757568	9.4440654	12.6992028	15.9629391	23.850700	1679	1.0034656
b[consonant2t̻ speaker_id:M01]	-9.9693452	0.1544271	6.4577728	-22.7137477	-14.0001527	-9.8286021	-5.7531024	2.600898	1749	0.9993977
b[(Intercept) speaker_id:M02]	-3.8539120	0.1281660	5.3443397	-14.9163100	-6.9758851	-3.7815376	-0.5726020	6.610812	1739	1.0029794
b[consonant2t̻ speaker_id:M02]	-6.5934511	0.1539033	6.4685281	-19.7311775	-10.5801402	-6.5056020	-2.6515402	6.436070	1767	0.9996702
b[(Intercept) speaker_id:M03]	0.7158164	0.1308474	5.2755550	-9.8673849	-2.3771565	0.8101970	3.7637536	11.396687	1626	1.0039424
b[consonant2t̻ speaker_id:M03]	16.1012277	0.1451787	6.4140828	4.5652059	11.9645594	15.7970588	19.8361816	30.198882	1952	1.0002797
b[(Intercept) word:ahdan]	-0.5716025	0.0418079	2.2408968	-5.5064846	-1.3208421	-0.2238359	0.2765287	3.506637	2873	1.0015551
b[consonant2t̻ word:ahdan]	-0.0181501	0.0608711	3.6897752	-7.6544970	-0.8552698	0.0095352	0.8465419	7.144780	3674	1.0003937
b[(Intercept) word:ata]	-0.3529255	0.0550845	3.0248305	-6.1382996	-1.1951815	-0.1546491	0.4256046	5.405387	3015	0.9997561
b[consonant2t̻ word:ata]	-0.3544991	0.0629966	3.3135271	-6.8762164	-1.2441720	-0.1440967	0.4161509	6.142704	2767	1.0001688
b[(Intercept) word:idi]	0.4486629	0.0404152	2.2485387	-3.7808842	-0.3332448	0.1691477	1.1503904	5.489988	3095	1.0003343
b[consonant2t̻ word:idi]	-0.0364130	0.0576910	3.6486997	-7.1641064	-0.8653133	-0.0166922	0.7944696	7.419529	4000	1.0001989
b[(Intercept) word:itik]	0.5738092	0.0496346	2.7357179	-3.9423128	-0.3220993	0.2103334	1.2833743	6.892119	3038	1.0007781
b[consonant2t̻ word:itik]	0.7125528	0.0680903	3.3760831	-4.8212693	-0.3134421	0.2605675	1.4169209	8.584818	2458	1.0001577
sigma	6.2406426	0.0113055	0.6601288	5.0908638	5.7880519	6.1877234	6.6341432	7.708066	3409	1.0003974
Sigma[speaker_id:(Intercept),(Intercept)]	109.3857156	2.9714416	119.3366409	21.8579352	49.4385033	78.9264339	129.9687177	357.017007	1613	1.0017576
Sigma[speaker_id:consonant2t̻,(Intercept)]	-23.9270888	1.0654360	55.3921423	-155.7941350	-44.4166834	-18.1984745	2.4683263	77.188817	2703	0.9996994
Sigma[speaker_id:consonant2t̻,consonant2t̻]	159.3992785	2.8662409	132.4901999	38.2243540	82.3267490	123.0551629	194.2214645	481.567991	2137	0.9997013
Sigma[word:(Intercept),(Intercept)]	10.1829985	0.7297086	33.2699315	0.0045218	0.4306496	2.0892145	7.6793833	65.435740	2079	1.0002724
Sigma[word:consonant2t̻,(Intercept)]	-1.3140427	0.3119016	17.1061765	-21.6688631	-0.7868898	-0.0021438	0.4689946	11.632506	3008	0.9998489
Sigma[word:consonant2t̻,consonant2t̻]	16.2181014	1.0468991	54.8412758	0.0049246	0.4333007	2.4546309	10.0248617	132.629700	2744	0.9998889
mean_PPD	30.8268879	0.0181754	1.1495141	28.5680200	30.0632324	30.8350506	31.5965404	33.052110	4000	0.9995367
log-posterior	-238.7857730	0.1290156	4.6788612	-248.6780425	-241.6852363	-238.4362011	-235.4064512	-230.728938	1315	1.0008418

Model 3 Results

	mean	mcse	sd	2.5%	25%	50%	75%	97.5%	n_eff	Rhat
(Intercept)	13.9642848	0.1255139	5.2862739	3.1361188	11.0509740	13.9311869	17.0777707	24.691811	1774	0.9995626
consonant2t̻	32.0826482	0.1900716	6.8916103	18.0067462	27.9681689	32.0435970	36.0816424	46.336768	1315	1.0033702
b[(Intercept) speaker_id:F01]	-1.5302904	0.1213132	5.1595532	-12.2904876	-4.5585220	-1.5071204	1.5791178	8.610477	1809	0.9996977
b[consonant2t̻ speaker_id:F01]	-4.7768015	0.1594894	6.3289503	-17.7257696	-8.7602367	-4.7047401	-0.8222776	7.737750	1575	1.0022687
b[(Intercept) speaker_id:F02]	-7.6194908	0.1193882	5.2272817	-18.8686135	-10.7747593	-7.4814266	-4.3411413	2.323210	1917	1.0000940
b[consonant2t̻ speaker_id:F02]	6.7769788	0.1547371	6.2980373	-5.6811122	2.7948426	6.6968777	10.6883551	19.593682	1657	1.0020555
b[(Intercept) speaker_id:M01]	12.8219215	0.1269118	5.2120165	2.6440517	9.5812362	12.6441052	15.9817349	23.732545	1687	1.0001472
b[consonant2t̻ speaker_id:M01]	-10.6726342	0.1551055	6.4239402	-23.8206246	-14.4838725	-10.4911580	-6.6159101	1.476965	1715	1.0021635
b[(Intercept) speaker_id:M02]	-3.7475418	0.1178971	5.2498634	-13.9676660	-6.8412054	-3.7694361	-0.5812705	6.885174	1983	1.0005591
b[consonant2t̻ speaker_id:M02]	-7.4626967	0.1438520	6.4531613	-20.3668920	-11.4871342	-7.4374432	-3.2859368	5.159041	2012	1.0026973
b[(Intercept) speaker_id:M03]	0.8052862	0.1194252	5.1499723	-9.7182341	-2.3057735	0.8143507	3.9433914	11.058106	1860	0.9998734
b[consonant2t̻ speaker_id:M03]	15.1600690	0.1717555	6.4186071	2.5838749	11.0604312	14.8079068	19.1281916	28.194856	1397	1.0017766
b[(Intercept) word:ahdan]	-0.4639709	0.0384139	2.2340726	-5.4803073	-1.2297703	-0.1988099	0.2850377	4.002418	3382	0.9997312
b[consonant2t̻ word:ahdan]	0.1216502	0.0598031	3.7822814	-7.2350099	-0.8235421	0.0063251	0.9773072	7.849380	4000	0.9993267
b[(Intercept) word:ata]	-0.4843028	0.0456596	2.7196091	-6.4657134	-1.2464827	-0.1866374	0.3818759	4.504693	3548	0.9992919
b[consonant2t̻ word:ata]	-0.4672058	0.0546052	2.9632833	-6.5674941	-1.3714779	-0.2005452	0.3459040	5.538544	2945	1.0001976
b[(Intercept) word:idi]	0.6035102	0.0401485	2.2781105	-3.3654478	-0.2866984	0.2329395	1.3053576	6.019724	3220	0.9991333
b[consonant2t̻ word:idi]	-0.0003454	0.0738116	4.0638324	-8.3097916	-0.8763403	0.0043577	0.8040879	7.669826	3031	1.0004483
b[(Intercept) word:itik]	0.4689619	0.0459315	2.6030958	-4.5123515	-0.3572086	0.1811649	1.2375526	6.059717	3212	1.0012347
b[consonant2t̻ word:itik]	0.5790129	0.0597638	3.0314378	-4.7012075	-0.3587379	0.1818185	1.3865031	7.569620	2573	0.9991393
sigma	6.2668376	0.0107517	0.6799999	5.0720898	5.7877607	6.2122055	6.6877624	7.694882	4000	1.0000970
Sigma[speaker_id:(Intercept),(Intercept)]	109.4096891	2.1925481	97.1614838	22.3947189	51.0143263	81.5969220	132.6112301	358.757366	1964	1.0003071
Sigma[speaker_id:consonant2t̻,(Intercept)]	-22.7260599	1.0912958	57.8124972	-146.4107324	-44.2173930	-17.2813898	2.9971406	80.936666	2806	1.0020638
Sigma[speaker_id:consonant2t̻,consonant2t̻]	160.0292421	2.6974995	131.2439131	35.6382993	82.3131742	124.1417299	194.7151863	490.847653	2367	1.0006840
Sigma[word:(Intercept),(Intercept)]	9.9382626	0.5093796	28.1101736	0.0034933	0.3980251	2.0886674	7.9939197	76.805961	3045	0.9993027
Sigma[word:consonant2t̻,(Intercept)]	-0.7197121	0.1922373	12.1581554	-16.6219774	-0.8116888	-0.0036399	0.4309056	11.617087	4000	0.9995805
Sigma[word:consonant2t̻,consonant2t̻]	14.7685657	1.0254484	55.3351743	0.0028683	0.4427945	2.5722091	10.4894988	105.721342	2912	1.0001081
mean_PPD	30.8019709	0.0185724	1.1746229	28.4877981	30.0332389	30.7937003	31.5746953	33.088275	4000	0.9999336
log-posterior	-240.5881277	0.1536367	4.9786412	-251.3596254	-243.7667338	-240.2288973	-237.1560916	-231.899535	1050	1.0020654

Compared to frequentist model

model_freq <- lmer(vot*1000 ~ consonant2 + (1+consonant2|speaker_id) + (1+consonant2|word),data=data_vot)
sjPlot::sjt.lmer(model_freq)

	vot * 1000
	B	CI	p
Fixed Parts
(Intercept)	14.21	6.49 – 21.94	.020
consonant2 (t̻)	31.84	20.59 – 43.09	.004
Random Parts
σ²	36.684
τ_{00, speaker_id}	70.586
τ_{00, word}	0.143
ρ₀₁	-0.499
N_{speaker_id}	5
N_word	4
ICC_{speaker_id}	0.657
ICC_word	0.001
Observations	58
R² / Ω₀²	.916 / .916

	vot * 1000
	B	CI	p
Fixed Parts
(Intercept)	14.21	6.49 – 21.94	.020
consonant2 (t̻)	31.84	20.59 – 43.09	.004
Random Parts
σ²	36.684
τ_{00, speaker_id}	70.586
τ_{00, word}	0.143
ρ₀₁	-0.467
N_{speaker_id}	5
N_word	4
ICC_{speaker_id}	0.657
ICC_word	0.001
Observations	58
R² / Ω₀²	.916 / .916

Plotting Posterior Distribution: Model1

Plotting Posterior Distribution: Model2

Plotting Posterior Distribution: Model3

Diagnostics and Comparing Models

Model Diagnostics

In summary output want Rhat to be 1.0 or very close to 1 (like 0.98) [measure of how well the chains mix]
Use launch_shinystan(model1) to explore Stan’s built in diagnostic tools (launches page in your web browser)

Example of Shinystan

How well did the model sample

Model Comparisons

Like frequentist models, we can compare two different models to see how different they are (like comparing random effects)
Instead of anova() we use loo() (leave one out) and compare_models()
Can take a little bit to run
loo() is actually more like comparing AIC values
If elpd_diff is small, then little difference between models
loo is an approximation of k_fold(model,K=10) but is much faster

Loo

The LOO Information Criterion (LOOIC) has the same purpose as the Akaike Information Criterion (AIC) that is used by frequentists.
Both are intended to estimate the expected log predictive density (ELPD) for a new dataset.
However, the AIC ignores priors and assumes that the posterior distribution is multivariate normal, whereas the functions from the loo package do not make this distributional assumption and integrate over uncertainty in the parameters.
This only assumes that any one observation can be omitted without having a major effect on the posterior distribution

ELPD

The difference in ELPD will be negative if the expected out-of-sample predictive accuracy of the first model is higher.
If the difference is be positive then the second model is preferred.

Example loo

model1a <-stan_lmer(vot*1000 ~ consonant2 + (1+consonant2|speaker_id),
data=data_vot,
prior_intercept = normal(0, 5),
prior = normal(0, 5), 
prior_covariance = decov(regularization = 2),
chains = 4,
iter = 2000, adapt_delta=0.999,warmup=1000)

loo1 <- loo(model1) 
loo1a <- loo(model1a)

Compare models

compare_models(loo1,loo1a)

## elpd_diff        se 
##       0.4       1.1

Summary

Pros and cons of Bayesian analyses

Pros
Easier to intrpret
No p-values
Uses past knowledge (more scientific)
Can use more complex models
Can do multiple comparisons
Have distributions of results
Can make more nuanced claims
Cons
Slower
New for linguistics so have to explain more

Resources

data-science R Bayesian

Intro to Bayesian Linear Mixed Effect Models with rstanarm

Overview of Presentation

Bayesian vs. Frequentist Inference

Frequentist Inference

Bayesian Inference

Bayes’ Theorem

Bayesian Terms

What they look like

Applying Bayes’ Theorem: Example Problem

Solution

Solution (con’t)

Solution visualized

Applying Bayesian inference to LMERs

Applying Bayesian Inference

Applying Bayesian Inference

rstanarm

Selecting priors

Selecting priors

VOT example

Sampling the posterior distribution

Let’s make some models!

More models

One last model

Interpreting Results

What are p-values?

Intrepreting p-values

Bayesian Approach

Model 1 Results

Model 2 Results

Model 3 Results

Compared to frequentist model

Plotting Posterior Distribution: Model1

Plotting Posterior Distribution: Model2

Plotting Posterior Distribution: Model3

Diagnostics and Comparing Models

Model Diagnostics

Example of Shinystan

How well did the model sample

Model Comparisons

Loo

ELPD

Example loo

Compare models

Summary

Pros and cons of Bayesian analyses

Resources

Related