Embed Notice

HTML Code

<blockquote style="position: relative; padding-left: 55px;"><section><a href="https://infosec.exchange/users/david_chisnall/statuses/113695883071728401">David Chisnall (*Now with 50% more sarcasm!*) (david_chisnall@infosec.exchange)'s status on Monday, 23-Dec-2024 02:03:46 JST</a><a href="https://infosec.exchange/@david_chisnall" title="david_chisnall@infosec.exchange"><img src="https://gnusocial.jp/avatar/241214-48-20240302204654.webp" width="48" height="48" alt="David Chisnall (*Now with 50% more sarcasm!*)" style="position: absolute; left: 0; top: 0;">David Chisnall (*Now with 50% more sarcasm!*)</a><div><a href="https://queer.hacktivis.me/objects/f06e49a1-c204-4ffd-b6b1-28d143837a88" rel="in-reply-to">in reply to</a><ul><li><li><a href="https://gnusocial.jp/user/36511" title="carbontwelve@notacult.social">Mx Autumn :blobcatpumpkin:</a></li></ul></div></section><article><p><a href="https://queer.hacktivis.me/users/lanodan" rel="nofollow noreferrer">@lanodan</a> <a href="https://notacult.social/@carbontwelve" rel="nofollow noreferrer">@carbontwelve</a> Spam filtering has been a good application for machine learning for ages. I think the first Bayesian spam filters were added around the end of the last century. It has several properties that make it a good fit for ML:</p><ul><li>The cost of letting spam through is low, the value in filtering most of it correctly is high.</li><li>There isn’t a rule-based approach that works well. You can’t write a list of properties that make something spam. You can write a list of properties that indicate something has a higher chance of being spam.</li><li>The problem changes rapidly. Spammers change their tactics depending on what gets through filters and so a system that adapts on the defence works well. You have a lot of data of ham vs spam to do the adaptation.</li></ul><p>Note that this is not the same for intrusion detection and a lot of ML-based approaches for intrusion detection have failed. It is bad if you miss a compromise and you don’t have enough examples of malicious and non-malicious data for your categoriser to adapt rapidly.</p><p>The last point is part of why it worked well in my use case and was great for Project Silica when I was at MS. They were burning voxels into glass with lasers and then recovering the data. With a small calibration step (burn a load of known-value voxels into a corner of the glass) they could build an ML classifier that worked on any set of laser parameters. It might not have worked quite as well as a well-tuned rule-based system, but they could do experiments as fast as the laser could fire with the ML approach, whereas a rule-based system needed someone to classify the voxel shapes and redo the implementation, which took at least a week. That was a huge benefit. Their data included error-correction codes, so as long as their model was mostly right, ECC would fix the rest.</p></article><footer><a rel="bookmark" href="https://gnusocial.jp/conversation/4230250#notice-8280619">In conversation</a><time datetime="2024-12-23T02:03:46+09:00" title="Monday, 23-Dec-2024 02:03:46 JST">about 5 months ago</time> <span>from <span><a href="https://gnusocial.jp/notice/8280619" rel="external" title="Sent from gnusocial.jp via ActivityPub">gnusocial.jp</a></span></span><a href="https://gnusocial.jp/notice/8280619">permalink</a></footer></blockquote>

Corresponding Notice

Embed this notice
David Chisnall (*Now with 50% more sarcasm!*) (david_chisnall@infosec.exchange)'s status on Monday, 23-Dec-2024 02:03:46 JSTDavid Chisnall (*Now with 50% more sarcasm!*)
in reply to
- Haelwenn /элвэн/ :triskell:
- Mx Autumn :blobcatpumpkin:
@lanodan @carbontwelve Spam filtering has been a good application for machine learning for ages. I think the first Bayesian spam filters were added around the end of the last century. It has several properties that make it a good fit for ML:
- The cost of letting spam through is low, the value in filtering most of it correctly is high.
- There isn’t a rule-based approach that works well. You can’t write a list of properties that make something spam. You can write a list of properties that indicate something has a higher chance of being spam.
- The problem changes rapidly. Spammers change their tactics depending on what gets through filters and so a system that adapts on the defence works well. You have a lot of data of ham vs spam to do the adaptation.
Note that this is not the same for intrusion detection and a lot of ML-based approaches for intrusion detection have failed. It is bad if you miss a compromise and you don’t have enough examples of malicious and non-malicious data for your categoriser to adapt rapidly.
The last point is part of why it worked well in my use case and was great for Project Silica when I was at MS. They were burning voxels into glass with lasers and then recovering the data. With a small calibration step (burn a load of known-value voxels into a corner of the glass) they could build an ML classifier that worked on any set of laser parameters. It might not have worked quite as well as a well-tuned rule-based system, but they could do experiments as fast as the laser could fire with the ML approach, whereas a rule-based system needed someone to classify the voxel shapes and redo the implementation, which took at least a week. That was a huge benefit. Their data included error-correction codes, so as long as their model was mostly right, ECC would fix the rest.
In conversationabout 5 months ago from gnusocial.jppermalink

Public

Embed Notice

HTML Code

Corresponding Notice