In a methods / #DigitalHumamities class next semester, I want to cover basic corpus creation. Especially, I’ll probably focus on #OCR/#HTR/#ATR and #WebScraping. I find it incredibly hard to find good papers that can serve as a general introduction into these topics. All I find are either practical tutorials, or very specialized papers about specific approaches. Do you have any favorite readings about how to get to a text corpus in DH in the first place? Please share!