nasg/pandoc.py

__author__ = "Peter Molnar"
__copyright__ = "Copyright 2017-2018, Peter Molnar"
__license__ = "apache-2.0"
__maintainer__ = "Peter Molnar"
__email__ = "mail@petermolnar.net"

import subprocess
import logging

class Pandoc(str):
   def __new__(cls, text):
        """ Pandoc command line call with piped in- and output """
        cmd = (
            'pandoc',
            '-o-',
            '--from=markdown+%s' % (
                '+'.join([
                    'footnotes',
                    'pipe_tables',
                    'strikeout',
                    #'superscript',
                    #'subscript',
                    'raw_html',
                    'definition_lists',
                    'backtick_code_blocks',
                    'fenced_code_attributes',
                    'shortcut_reference_links',
                    'lists_without_preceding_blankline',
                    'autolink_bare_uris',
                ])
            ),
            '--to=html5',
            '--quiet',
            '--no-highlight'
        )
        p = subprocess.Popen(
            cmd,
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
        )

        stdout, stderr = p.communicate(input=text.encode())
        if stderr:
            logging.warning(
                "Error during pandoc covert:\n\t%s\n\t%s",
                cmd,
                stderr
            )
        r = stdout.decode('utf-8').strip()
        return str.__new__(cls, r)
After long discussions, mainly listening and reading, I'm giving up on GPL licencing and moving to Apache 2.0. The short summary is that while I still sort of believe in what GPL stands for, reality is not that simple to immediately open source everything. Because GPL is scary, many people avoid it, and one of the main achievements of open source should be that nobody has to reinvent the wheel. 2018-12-03 10:36:10 +00:00			`__author__ = "Peter Molnar"`
			`__copyright__ = "Copyright 2017-2018, Peter Molnar"`
			`__license__ = "apache-2.0"`
			`__maintainer__ = "Peter Molnar"`
			`__email__ = "mail@petermolnar.net"`

Back To Pandoc So, Python Markdown is a bottomless pit of horrors, including crippling parsing bugs, random   out of nowhere, lack of features. It's definitely much faster, than Pandoc, but Pandoc doesn't go full retard where there's a regex in a fenced code block, that happens to be a regex for markdown elements. Also added some ugly post string replacements to make Pandoc fenced code output work with Prism: instead of the Pandoc <pre class="codelang"><code>, Prism wants <pre><code class="language-codelang>, so I added a regex sub, because it's 00:32. 2018-08-04 00:28:55 +01:00			`import subprocess`
			`import logging`

- removed python markdown traces - fixed gone_re php regexes - better nav header layout - follow page instead of direct RSS link - logger in settings for nasg instead of direct logging - prism.js only for articles with language code blocks - added webmention.io webhook to email template 2018-08-08 09:42:42 +01:00			`class Pandoc(str):`
- CSS fixes and simplifications - prism.js inlined (only for entries with code blocks) - pandoc is a subclass is str now - added 'nasg' logger - minor bugfixes 2018-08-04 09:30:26 +01:00			`def __new__(cls, text):`
			`""" Pandoc command line call with piped in- and output """`
			`cmd = (`
			`'pandoc',`
			`'-o-',`
			`'--from=markdown+%s' % (`
			`'+'.join([`
			`'footnotes',`
			`'pipe_tables',`
- removed python markdown traces - fixed gone_re php regexes - better nav header layout - follow page instead of direct RSS link - logger in settings for nasg instead of direct logging - prism.js only for articles with language code blocks - added webmention.io webhook to email template 2018-08-08 09:42:42 +01:00			`'strikeout',`
			`#'superscript',`
			`#'subscript',`
- CSS fixes and simplifications - prism.js inlined (only for entries with code blocks) - pandoc is a subclass is str now - added 'nasg' logger - minor bugfixes 2018-08-04 09:30:26 +01:00			`'raw_html',`
			`'definition_lists',`
			`'backtick_code_blocks',`
			`'fenced_code_attributes',`
			`'shortcut_reference_links',`
			`'lists_without_preceding_blankline',`
			`'autolink_bare_uris',`
			`])`
			`),`
			`'--to=html5',`
			`'--quiet',`
			`'--no-highlight'`
			`)`
			`p = subprocess.Popen(`
Back To Pandoc So, Python Markdown is a bottomless pit of horrors, including crippling parsing bugs, random   out of nowhere, lack of features. It's definitely much faster, than Pandoc, but Pandoc doesn't go full retard where there's a regex in a fenced code block, that happens to be a regex for markdown elements. Also added some ugly post string replacements to make Pandoc fenced code output work with Prism: instead of the Pandoc <pre class="codelang"><code>, Prism wants <pre><code class="language-codelang>, so I added a regex sub, because it's 00:32. 2018-08-04 00:28:55 +01:00			`cmd,`
- CSS fixes and simplifications - prism.js inlined (only for entries with code blocks) - pandoc is a subclass is str now - added 'nasg' logger - minor bugfixes 2018-08-04 09:30:26 +01:00			`stdin=subprocess.PIPE,`
			`stdout=subprocess.PIPE,`
			`stderr=subprocess.PIPE,`
Back To Pandoc So, Python Markdown is a bottomless pit of horrors, including crippling parsing bugs, random   out of nowhere, lack of features. It's definitely much faster, than Pandoc, but Pandoc doesn't go full retard where there's a regex in a fenced code block, that happens to be a regex for markdown elements. Also added some ugly post string replacements to make Pandoc fenced code output work with Prism: instead of the Pandoc <pre class="codelang"><code>, Prism wants <pre><code class="language-codelang>, so I added a regex sub, because it's 00:32. 2018-08-04 00:28:55 +01:00			`)`
- CSS fixes and simplifications - prism.js inlined (only for entries with code blocks) - pandoc is a subclass is str now - added 'nasg' logger - minor bugfixes 2018-08-04 09:30:26 +01:00
			`stdout, stderr = p.communicate(input=text.encode())`
			`if stderr:`
			`logging.warning(`
			`"Error during pandoc covert:\n\t%s\n\t%s",`
			`cmd,`
			`stderr`
			`)`
			`r = stdout.decode('utf-8').strip()`
			`return str.__new__(cls, r)`