Proposal to normalize wikinames for 3.0:
Generic comments:
- Turn the current list of allowed characters into a user setting (taking care of Murray's concerns)
- Upon save, disallowed characters create an error instead of being silently ignored
- Turn [free link] lookup also from CamelCased page names into an user setting (by default off for new wikis). This provides backwards compatibility, turning [this is a link] to ThisIsALink.
When a freelink is encountered on a page (below, wikipath means the FQN of the page; in 2.x this is exactly the WikiName; in 3.0 it can also contain subpage+space information):
- Link is parsed to its three elements: text, wikipath, parameters
- Whitespace in front of and after the individual wikipath components (separated by / or :) is removed. Any excess whitespace is collapsed.
- Each component is checked against the allowed character list, and if unallowed characters are detected, the link parsing is stopped and a warning condition is raised
- Each component is lowercased with String.lowerCase() to make any page name comparisons case insensitive.
- The resulting wikipath is turned into a JCR path (and any characters allowed by JSPWiki but not allowed by JCR spec are escaped)
- The JCR path is then passed to the Repository and it is checked if page exists
- If the user setting so dictates, in case the wikiPath is not found, it is camelcasified using the current TextUtil.wikifyLink() and it is tried again.
When a page is created, the following process takes place:
- The wikiPath components are stripped of leading and trailing whitespace. Any excess whitespace is collapsed.
- Each component is checked against the allowed character list
- A "wiki:title" property is set to correspond to the typography of the resulting name of the page (=last component in the wikipath)
- Each component is lowercased with String.lowerCase()
- The wikipath is then turned to a JCR path and the content (including properties) are saved
When a page title is rendered, the following process takes place:
- The "page" parameter is parsed into a wikipath
- The wikiPath components are stripped of extra whitespace
- The illegal chars are checked
- The JCR path is formed
- The Node is fetched, and WikiPage created. The title of the page is from the "wiki:title" property.
When a page is renamed:
- The proper WikiPage object is located.
- If the rename process would result with a different JCR path, the page is moved
- In any case, the new title is written to the value of the wiki:title property
Yes, this means that a page title and it's JCR path will be subtly different, but that the wiki:title property keeps the representation and the path keeps the organization.
E.g. "?page=Foo%20bar" => WikiName = "Main:Foo bar" => JCR path = "/pages/main/foo bar" => wiki:title = "Foo bar".
Summary/Paraphrase of above?#
- A new property tryCamelCase=true|false controls if a request for "Test Name", "Test name", "Test+Name", "Test+name", "Test%20Name" or "Test%20name" looks for "TestName" in the repository.
- A new property tryBeautified=true|false controls if a request for "TestName" gets broken into "Test Name" (and then further normalized to "Test name").
- A new property illegalCharacters defaults to the same list as Wikipedia, "#<>|{}" (or alternatively allowedCharacters? not sure which sense is best.)
- Any illegal characters in a path-component at page creation causes an error.
- Any illegal characters in a [free link|Some:path] at time of rendering are silently dropped and a view link to the normalized name (if already existing) is rendered, else render "specially" as a "bad link name"?
- Each path-component gets trimmed.
- All internal whitespace of each path-component is collapsed to a single space.
- e.g. (spaceName:topPageName\subPageName\attachmentName) has four components.
- What about links to headings? (spaceName:\topPageName\subPageName#headingName) don't spaces get mutated into '-' characters currently?
- There may be a difference between the name and title for a page:
- Names are all lowercase
- Titles maintain the original case of the name as given at time of creation.
- Is this an invariant? (name.equalsIgnoreCase(title) == true)
- Titles are a maintained property, separate from, but closely related to, the page name.
- Renames affect both the name and the title of the page.