Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="tag:gnusocial.jp,2025-02-19:noticeId=8992522:objectType=note">Linux Walt (@lnxw37j1) {3EB165E0-5BB1-45D2-9E7D-93B31821F864} (lnxw37j1@gnusocial.jp)'s status on Thursday, 20-Feb-2025 00:57:09 JST</a><a href="https://gnusocial.jp/lnxw37j1" title="lnxw37j1@gnusocial.jp"><img src="https://gnusocial.jp/theme/gnusocialjp/default-avatar-stream.png" width="48" height="48" alt="Linux Walt (@lnxw37j1) {3EB165E0-5BB1-45D2-9E7D-93B31821F864}" style="position: absolute; left: 0; top: 0;">Linux Walt (@lnxw37j1) {3EB165E0-5BB1-45D2-9E7D-93B31821F864}</a></section><article>We're covering the Central Limit Theorem and taking large numbers of samples from a dataset. To my way of thinking, we should be sampling with replacement, but the exercise is incorrect if I do it that way. And unfortunately, DataCamp doesn't tell me *why* they're sampling without replacement.<br>
<br>
Sample with replacement:<br>
* underlying data source for every sample is the same<br>
* taking the mean of the sample means is normally distributed, so standard stuff just works (example: 68% of sample means within plus/minus one SD of mean/95% of sample means within plus/minus 2 SD of mean)<br>
* with a large enough set of samples, there can be more sampled observations than actual observations.<br>
<br>
Sample without replacement:<br>
* underlying data source changes for each succeeding sample<br>
* taking mean of sample means should become gradually closer to that of the data distribution of the source, since as the number of samples increases, the samples get closer to exhausting the underlying source.<br>
* exhausting the underlying source means that there are no more samples once sampled observations is equal to underlying observations. <br>
<br>
I can see where CLT applies when replacement is turned on, but without replacement, each sample is from a different dataset ... smaller and smaller subsets of the original dataset.</article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/4593554#notice-8992522">In conversation</a><time datetime="2025-02-20T00:57:09+09:00" title="Thursday, 20-Feb-2025 00:57:09 JST">about 2 days ago</time> <span>from <span>web</span></span><a href="https://gnusocial.jp/notice/8992522">permalink</a></footer></blockquote>

Corresponding Notice

Embed this notice
Linux Walt (@lnxw37j1) {3EB165E0-5BB1-45D2-9E7D-93B31821F864} (lnxw37j1@gnusocial.jp)'s status on Thursday, 20-Feb-2025 00:57:09 JST Linux Walt (@lnxw37j1) {3EB165E0-5BB1-45D2-9E7D-93B31821F864}
We're covering the Central Limit Theorem and taking large numbers of samples from a dataset. To my way of thinking, we should be sampling with replacement, but the exercise is incorrect if I do it that way. And unfortunately, DataCamp doesn't tell me *why* they're sampling without replacement.

Sample with replacement:
* underlying data source for every sample is the same
* taking the mean of the sample means is normally distributed, so standard stuff just works (example: 68% of sample means within plus/minus one SD of mean/95% of sample means within plus/minus 2 SD of mean)
* with a large enough set of samples, there can be more sampled observations than actual observations.

Sample without replacement:
* underlying data source changes for each succeeding sample
* taking mean of sample means should become gradually closer to that of the data distribution of the source, since as the number of samples increases, the samples get closer to exhausting the underlying source.
* exhausting the underlying source means that there are no more samples once sampled observations is equal to underlying observations.

I can see where CLT applies when replacement is turned on, but without replacement, each sample is from a different dataset ... smaller and smaller subsets of the original dataset.
In conversationabout 2 days ago from webpermalink

Public

Embed Notice

HTML Code

Corresponding Notice