Requirements for Source Documents: Unterschied zwischen den Versionen

Aus firesys Hilfe
Zur Navigation springen Zur Suche springen
Keine Bearbeitungszusammenfassung
Keine Bearbeitungszusammenfassung
Zeile 278: Zeile 278:
This issue is less common and depends on the font you are using. If you use tabs with dots as “'''Leader'''” make sure you do not have a “'''Space'''” in the “Leader”. In some fonts this can become an issue in the conversion to XHTML and all dots in your document will also have a space next to the dot.
This issue is less common and depends on the font you are using. If you use tabs with dots as “'''Leader'''” make sure you do not have a “'''Space'''” in the “Leader”. In some fonts this can become an issue in the conversion to XHTML and all dots in your document will also have a space next to the dot.


8.  Do Not Use Private-use Characters
===Do Not Use Private-use Characters===


Private-use character: A character whose use is defined by private users and companies rather than defined by a standard such as Unicode, and which therefore has no universally accepted meaning.
'''Private-use character''': A character whose use is defined by private users and companies rather than defined by a standard such as '''Unicode''', and which therefore has no universally accepted meaning.


“Private-use characters” are quite unusual to use in InDesign. If you use a “Private-use character” it will not be displayed correctly in XHTML. To get information about characters open the Glyphs panel (Type>Glyphs). See example below.
“Private-use characters” are quite unusual to use in InDesign. If you use a “Private-use character” it will not be displayed correctly in XHTML. To get information about characters open the '''Glyphs''' panel (Type>Glyphs). See example below.


Private-use character                                           
'''Private-use character'''                                          


Unicode character
'''Unicode character'''
9.  Substituted Glyphs


In InDesign you can highlight the “Substituted Glyphs” that may create issues in the ESEF report. Not all of the highlighted glyphs will create issues.
===Substituted Glyphs===
 
In InDesign you can highlight the “'''Substituted Glyphs'''” that may create issues in the ESEF report. Not all of the highlighted glyphs will create issues.


In the below example Ligatures, Contextual alternatives and Tabular Lining are highlighted and can create issues after conversion to XHTML.
In the below example Ligatures, Contextual alternatives and Tabular Lining are highlighted and can create issues after conversion to XHTML.


Note: Hyphen-minus is always highlighted as “Substituted Glyphs” in InDesign.
Note: '''Hyphen-minus''' is always highlighted as “Substituted Glyphs” in InDesign.
==Additional Information==


===Text Effects===
Additional Information
1.  Text Effects


If you have applied effects (e.g. opacity, multiply) on text in the InDesign document it will go back to default after the conversion to XHTML. If you want to apply effects, you need to create outlines of the text.
If you have applied effects (e.g. opacity, multiply) on text in the InDesign document it will go back to default after the conversion to XHTML. If you want to apply effects, you need to create outlines of the text.
2.  Text Behind
 
===Text Behind===


If you have text hidden behind an object/image in InDesign the text will become visible when you convert to XHTML
If you have text hidden behind an object/image in InDesign the text will become visible when you convert to XHTML
EPub Font Folder
 
==EPub Font Folder==


If special fonts are used in ePub files, the TTF files have to be added to the Tagger font folder:
If special fonts are used in ePub files, the TTF files have to be added to the Tagger font folder:
How to Prepare a Word File
 
MS Word Requirements and Limitations
==How to Prepare a Word File==
 
===MS Word Requirements and Limitations===


The XBRL Tagger is able to tag any MS Word documents properly with the following requirements and limitations:
The XBRL Tagger is able to tag any MS Word documents properly with the following requirements and limitations:


    It is not possible to tag any value of a table that is included as an image in a document.
* It is not possible to tag any value of a table that is included as an image in a document.
 
* For MS Word documents it is required to use styles (heading 1, heading 2, etc.) to structure the documents.
 
** The chapter headings are used by the Tagger to allow easy navigation through the document.  


    For MS Word documents it is required to use styles (heading 1, heading 2, etc.) to structure the documents.  
** All tables that have to be tagged must be normal Word tables (no embedded Excel or similar).  


        The chapter headings are used by the Tagger to allow easy navigation through the document.  
** To change the outline level of styles, right click on the paragraph and select Paragraph and then select Outline level. For more information look at our FAQ #304 and FAQ #305.


        All tables that have to be tagged must be normal Word tables (no embedded Excel or similar).  
* Shapes and images anchored in front of text or behind text are placed at the anchor position. This might lead to different layout when converting to XHTML.


        To change the outline level of styles, right click on the paragraph and select Paragraph and then select Outline level. For more information look at our FAQ #304 and FAQ #305.
* Images and shapes inserted as embedded Office objects (e.g. diagrams from PowerPoint or Excel) can't be converted to XHTML. Those images must be converted to pure images e.g. by taking a screenshot and inserting it.


    Shapes and images anchored in front of text or behind text are placed at the anchor position. This might lead to different layout when converting to XHTML.
* Two-column text layout is not yet supported for MS Word to XHTML conversion.


    Images and shapes inserted as embedded Office objects (e.g. diagrams from PowerPoint or Excel) can't be converted to XHTML. Those images must be converted to pure images e.g. by taking a screenshot and inserting it.
You can also checkt out the FAQ for the HTML Converter, where many questions on Word Documents are answered.


    Two-column text layout is not yet supported for MS Word to XHTML conversion.
==How to Create a Compatible PDF From Word==


You can also checkt out the FAQ for the HTML Converter, where many questions on Word Documents are answered.
How to Create a Compatible PDF From Word
The most reliable way to create an iXBRL-compatible PDF from Word is to use the PDF-export functionality from Adobe. For that, you will have to have Adobe Acrobat installed on your computer and then use the following settings:
The most reliable way to create an iXBRL-compatible PDF from Word is to use the PDF-export functionality from Adobe. For that, you will have to have Adobe Acrobat installed on your computer and then use the following settings:



Version vom 25. September 2023, 11:13 Uhr

Document Requirements and Limitations

The XBRL Tagger is able to tag MS Word, InDesign, XHTML and PDF documents properly with the following requirements and limitations:

  • It is not possible to tag any value of a table that is included as an image in a document.
  • InDesign documents (IDD) can be tagged by exporting to ePub (File → Export to ePub). The document must use fonts that are available for web-browsing (XHMTL).
  • Scanned (PDF) reports can't be tagged, the XBRL Tagger does not include an OCR module.
  • For ePub files special fonts must be provided in the XBRL Tagger's fonts folder (see below).
  • For PDFs hidden text as well as some font-specific settings might lead to issues. (See more information below)

Differences Between Source Formats

Word EPub PDF (pdf2htmlEx) HTML
A4 Layout

Optional Enforced Enforced N/A
WYSIWYG

N/A Light Full N/A
Tags-Saving

In file/external External External In file/external
Chapter detection

Styles Outline Level Pages Document Bookmarks/Pages Headers
Font handling

Integrated Font must be supplied externally Integrated Integrated
Table detection

Auto Auto Manual Auto
Smart anchors

Yes Yes Yes No
XHTML formatting preserved

Partly* Full Full Full
Multiple tags per value

Yes No No No


*This depends on styles and formats applied to paragraphs, see limitations above.


How to Prepare a PDF File

PDF Requirements and Limitations

The XBRL Tagger is able to tag any PDF documents properly with the following requirements and limitations:

  • It is not possible to tag any value of a table that is included as an image in a document.
  • Make sure that the fonts that are being used are correct (this also applies to Word fonts when converting to PDF) with regards to Glyphs, otherwise conversion could lead to usage of wrong characters.
  • Scanned (PDF) reports can't be tagged, the XBRL Tagger does not include an OCR module.
  • For PDFs, hidden text as well as some font-specific settings might lead to issues. (See more information below)
  • Always use the same software to create different versions of the PDF, otherwise restoring the mapping might be an issue
    • This means when creating a PDF from Word and initially tagging this, you could get issues if you change the PDF afterwards with, for example, Adobe Pro
    • If you need to stitch multiple documents together, rather use the Merge iXBRL functionality in the Tagger after converting all parts to XHTML
  • Don't embed external PDFs into InDesign documents

Recommendations on Tagging of PDF Documents

PDF is a very universal format for creating documents. Converting it to XHTML can be a challenge, especially if the PDF document that is used as a source has issues itself.

Here are some recommendations to create the best-possible conversion outcome:

  • Keep in mind that PDF to HTML convertion is similar to actual printing but on a very special virtual device. Like printing on a physical print station this process can have font and color issues.
  • Make all fonts embedded.
  • Never add tables as pictures, also when converting from Word
  • Do not use Type 3 fonts, they are not supported in any case.
  • For CID fonts, make sure they include correct character mappings definitions.
  • Do not include hidden text in PDF documents, or remove it with Adobe Acrobat Redact.
  • Do not place any stamps/signs to PDF comments.
  • Use RGB color space.
  • Do not use special ICC color profiles.
  • Create a PDF document that is compliant with PDF/A-1a standard and that does not contain text that cannot be mapped to Unicode or inconsistent with information for rendered glyphs.
  • Major layout changes (styles, one-column to two-columns) can have a serious impact on the mapping restoration. Bear that in mind when planning.

Keep in mind that the tagging of PDFs requires an extra step.

What Are Hidden Facts?

When converting and tagging a PDF report with special font face in the XBRL Tagger, some facts (tags) might become hidden. The reason is that the Inline XBRL Specification does not allow individually formatted numbers to be tagged; e.g. when the font requires a special spacing between single characters by using HTML tags like , the number is no longer taggable. In the screenshot below, the number 24,540 is not taggable. In order to preserve the spacing and formatting of the PDF in the XHTML report, the XBRL Tagger moves the tag to an unformatted hidden section of the document and includes a link to the visual original number.

However, hiding facts an official mechanism of the Inline XBRL specification, as well as being allowed by ESMA in the Reporting Manual, page 34:

From AMANA's point of view, untaggable items, like the number in the example above, are not eligible for transformation and can be hidden. The XBRL International standard setter working group is aware of the issue and will probably publish and update Inline XBRL specification, which will make those numbers taggable in the future.

How to Avoid Hidden Facts

There are multiple ways to avoid or reduce hidden facts in iXBRL reports:

  • Tag Microsoft Word files instead of PDFs
  • Do not use special non-web fonts in PDF reports that provide a special spacing between characters.
  • Set the XBRL Tagger CMaps option to "Ignore" when opening a PDF file (this might lead however to uglier reports).
  • Use the latest XBRL Tagger version, which includes some new options to reduce/avoid hidden facts.
  • All numbers that are tagged need to have the OpenType setting “Default figure Style” to avoid “Hidden facts”. This setting only affects the digits in the report. To apply this setting you can manually choose “Default Figure Style” in the number columns or you can apply the setting in the Paragraph Style under “OpenType features”. Use 0 kerning in the tagged cells for best result.
  • Other problems that can occur when you convert the PDF to XHTML may be:
    • Text opacity - If you have a text with opacity in the document, the opacity will go back to default 100% after the conversion to XHTML. It will work if you create outlines of the text.
    • Text behind - If you have text hidden behind something in InDesign, the text will be visible when you convert to XHTML.

Remove Hidden Text From PDF Files Using Adobe Redact

In the case that you have hidden elements, it is possible to remove some of them using Adobe Redact. Hidden Text will be visible in the converted XHTML document. So, it must be removed before processing the PDF document with the Tagger.

Load the file into Adobe Acrobat Pro and click on the tools button.

Go to Protect & Standardize and click on Redact.

Click on Sanitize Document.

In the opening window you have to click on Click here.

After that you get a selection of all hidden elements. Remove all checks, but keep the one for Hidden Text and click on Remove.

Further Information About PDF Conversion

The limitations of the PDF converter:

  • CID (identity H) fonts embedded to the source document.
    • In this case, the converted document can contain unreadable (weird looking) text. To resolve this it is recommended to save the source document as PDF/X format in the Adobe Acrobat DC "Print Production" tool.
  • If the converted document has wrong color palette, see step 1.
  • The converter does not support PDF hidden text layers.
    • If so, you should remove hidden text layers in the Adobe Acrobat DC "Redact" tool.
  • The converter has fine tuning options helping to resolve the issues:
    • Please change the option "PDF unicode CMaps handling" to "Auto" and "Use autohint on fonts without hint"to "Use AutoHint" if the converted document does not look good.

If the conversion still doesn't meet the expectations or some tables cannot be tagged properly, the source file might need corrections.

The following cases are known:

  • The converted PDF looks good, but the imported table is unreadable.
  • The converted PDF contains unreadable fragments.
  • The PDF document has not been converted at all in the Tagger.
  • The converted PDF shows wrong colors, visual artifacts or extra text fragments or pages.

For cases 1-3 there are two methods to repair the document in Adobe:

  • Export the PDF to postscript and create a new file from it in Adobe Acrobat Distiller DC;
  • Convert the PDF to the stadard PDF 1/A with Adobe Preflight in the "PDF standards" tool.

For case 4 use "Sanitize Document" in the "Redact" tool and convert the document to PDF/X for correct colors.

If the document after all processing still has artifacts, the "fallback mode" option in the Tagger can be used.

How to Prepare a PDF File using InDesign (Best Practices)

General information

In this section you will get information on how to prepare your InDesign documents for tagging so you will avoid issues in your ESEF report.

When you create your ESEF report, a PDF is converted to XHTML. Converting PDF to XHTML is complicated and it’s important that you have the right settings in your InDesign documents to avoid issues.

Most of the issues are related to OpenType features in the documents’ fonts. In InDesign you can use different variants (Glyphs) of the characters. If you use two or more glyphs of a character with the same Unicode in your report the Tagger cannot distinguish between them. Only one of the glyph variants will be used in the report. This could make the report look different in the Tagger compared to the PDF.

Ligatures and different Figure Styles are known to create issues. For example,

you cannot mix different Figure Styles in the report. The Tagger will then choose one Figure Style in the report and this may cause the XHTML not to be generated correctly. The recommendation is to set the report to Default Figure Style to avoid issues.

The fonts you are using must be Unicode compatible. Most fonts today are compatible, but older or custom fonts may not be.

Do not use Variable fonts in your report. You need to use Static fonts to avoid converting issues.

Variable fonts have many different variations of a typeface to be incorporated into a single file.

Static fonts have a separate font file for every width, weight, or style.

Example of character with same Unicode but different Glyph ID (GID)

InDesign settings to avoid issues in the ESEF report

Turn off OpenType features

Ensure that OpenType features are turned off in your Paragraph Styles.

The recommendation is to use Default Figure Style in the report.

If you want to change the Figure Style (e.g. Tabular Lining) you need to be sure to change the setting in the whole report to avoid issues in the conversion to XHTML. If you use different Figure Styles in your report the Tagger will choose one of them and this may cause the XHTML not to be generated correctly.

Turn off Ligatures

Ensure that Ligatures (e.g. “fi” and “ff”) are turned off in your Paragraph Styles. Ligatures are default in InDesign and must be turned off to avoid issues.

Ligature: A glyph that combines the shapes of certain sequences of characters into a new form that makes for a more harmonious reading experience.

Avoid “White Space”

Avoid using “Insert White Space” in your report. “White spaces” can result in issues with spacing between words and letters in the XHTML. “Nonbreaking Space” is commonly used by designers but often creates issues. The recommendation is to use “Nonbreaking Space (Fixed Width)” instead. This space is more likely to work. You can create a “Keyboard Shortcut” in InDesign if you use it commonly.

Avoid OpenType Alternatives

Do not use the OpenType alternatives: Superscript/Superior, Superscript/Inferior, Numerator, Denominator. Use InDesign Superscript/Subscript instead.

Issues with “Small Caps”

Depending on the font you are using there could be issues if you use “Small Caps”. The recommendation is to avoid it. If you need to use “Small Caps”, pay attention to how the text looks after the conversion to XHTML.

Issues with “Section Marker”

If you insert a “Section Marker” in InDesign, it’s important that you do not use “All Caps” in the “Section Marker”. If you use “All Caps” font issues can arise in the Tagger and some Lowercase characters may be replaced with Uppercase characters in the report. If you want Uppercase characters in the “Section Marker” you can use Uppercase characters in the “Numbering & Section Options/Section Marker”.

Tab Issue

This issue is less common and depends on the font you are using. If you use tabs with dots as “Leader” make sure you do not have a “Space” in the “Leader”. In some fonts this can become an issue in the conversion to XHTML and all dots in your document will also have a space next to the dot.

Do Not Use Private-use Characters

Private-use character: A character whose use is defined by private users and companies rather than defined by a standard such as Unicode, and which therefore has no universally accepted meaning.

“Private-use characters” are quite unusual to use in InDesign. If you use a “Private-use character” it will not be displayed correctly in XHTML. To get information about characters open the Glyphs panel (Type>Glyphs). See example below.

Private-use character

Unicode character

Substituted Glyphs

In InDesign you can highlight the “Substituted Glyphs” that may create issues in the ESEF report. Not all of the highlighted glyphs will create issues.

In the below example Ligatures, Contextual alternatives and Tabular Lining are highlighted and can create issues after conversion to XHTML.

Note: Hyphen-minus is always highlighted as “Substituted Glyphs” in InDesign.

Additional Information

Text Effects

If you have applied effects (e.g. opacity, multiply) on text in the InDesign document it will go back to default after the conversion to XHTML. If you want to apply effects, you need to create outlines of the text.

Text Behind

If you have text hidden behind an object/image in InDesign the text will become visible when you convert to XHTML

EPub Font Folder

If special fonts are used in ePub files, the TTF files have to be added to the Tagger font folder:

How to Prepare a Word File

MS Word Requirements and Limitations

The XBRL Tagger is able to tag any MS Word documents properly with the following requirements and limitations:

  • It is not possible to tag any value of a table that is included as an image in a document.
  • For MS Word documents it is required to use styles (heading 1, heading 2, etc.) to structure the documents.
    • The chapter headings are used by the Tagger to allow easy navigation through the document.
    • All tables that have to be tagged must be normal Word tables (no embedded Excel or similar).
    • To change the outline level of styles, right click on the paragraph and select Paragraph and then select Outline level. For more information look at our FAQ #304 and FAQ #305.
  • Shapes and images anchored in front of text or behind text are placed at the anchor position. This might lead to different layout when converting to XHTML.
  • Images and shapes inserted as embedded Office objects (e.g. diagrams from PowerPoint or Excel) can't be converted to XHTML. Those images must be converted to pure images e.g. by taking a screenshot and inserting it.
  • Two-column text layout is not yet supported for MS Word to XHTML conversion.

You can also checkt out the FAQ for the HTML Converter, where many questions on Word Documents are answered.

How to Create a Compatible PDF From Word

The most reliable way to create an iXBRL-compatible PDF from Word is to use the PDF-export functionality from Adobe. For that, you will have to have Adobe Acrobat installed on your computer and then use the following settings:


Siehe auch

Navigation hoch.svg Technical Documentation

Weitere Inhalte

Webseite
Kundenbereich
YouTube