@todwest@gutenberg_org I was thinking the same. The Selknam, the Yagan, and the Kawesqar navigated those canals many years before, and they did very well to survive on such a difficult conditions.
@evan I know that place and that church! I was there at the end of June when visiting the ICS-FORTH institute after attending the Extended Semantic Web Conference.
@evan Regarding 2, it seems that in this case the ActivityStreams specification is not followed when your WordPress article is displayed as a note because it has many paragraphs.
Your answer suggests the following (which is quite interesting):
1. `Note` and `Article` seem to have implicitly associated a different HTML subset.
2. An object (with a single ID) can have two representations and belong to two classes, `Note` and `Article`. I assume that these classes are not disjoint. This may lead to ontological inconsistencies.
3. The downscaling looks like a type casting that depends on the capacities of one endpoint (it may not be needed between two WordPress instances). The downscaling seems to be done by WordPress, whereas it may also be done by Mastodon when representing the object. This sounds like content negotiation.
Yes, sanitizing HTML elements will produce interoperability problems.
I would trust the input if it satisfies a given schema. Otherwise, I would try to fix unclosed elements to match the requirements for the object type content. At the last resource, instead of sanitizing elements, I would delete them and leave only the text. This idea is in the robustness principle: Be conservative in what you do, be liberal in what you accept from others. I also think this principle should be used with caution.
I feel a little disappointed on how we are facing problems that were resolved a long time ago. When Web content was understood as XML documents, each element had a specific syntax that could be validated. Elements were composable, and we were able to define declarative mappings to, for example, implement the downscale needed by the Mastodon UX to render a WordPress article. In this regard, I agree with most @stevenpemberton's arguments in his "The 100 years web" talk (https://www.youtube.com/watch?v=jl4fnY4BjEY). But, this is probably a story for another thread.
ActivityPub defines a stream of objects whose content is essentially text, but can include HTML tags. For example, "<p>I <em>really</em> like strawberries!</p>" (wich I take from Example 8 in https://www.w3.org/TR/activitypub/).
In https://cosocial.ca/@evan/111771562317992298, @evan distinguishes short-form text (e.g., Mastodon 500 characters posts) from long-form text (WordPress article entries). Short-form text does not require much markup, but long-form text may require it to share a faithful copy of an article.
Currently, a WordPress blog does not look the same on Mastodon because the HTML elements are changed. For example, @evan blog entry https://evanp.me/2023/12/26/big-fedi-small-fedi/ has the element `<h2 class="wp-block-heading">Big Fedy</h2>`, which Mastodon shows as `<p><strong>Big Fedy</strong></p>`. I don't know who changed the original document structure (Mastodon or WordPress) but it appears that the HTML elements were modified to avoid breaking the Mastodon UX. I imagine that if we include SVG code directly on WordPress pages, these code may be complicated to render on Mastodon, even when the browser supports SVG rendering.
@evanprodromou, which HTML fragment is allowed for the content of an ActivityPub object?
Also, what content format is allowed for objects? HTML in the long-form term can include many different HTML elements, from SVG, equations, etc. A long-form text object could be divided into several sections. Do you want to support a reduced HTML subset, or something extensible as XML people dreamed?
I am a Chilean computer scientist at the Institute for Artificial Intelligence of the University of Stuttgart, Germany, researching Web and Data Science, knowledge graphs, graph data, and data provenance. I write about multiple topics in English and Spanish.#SemanticWeb #KnowledgeGraphs #DataProvenance