JSX and React: A Komodo Syntax Highlighting Case Study

An often overlooked feature in Komodo is the ability to create extensions for highlighting the syntax of new and custom languages. It seems that every week brings news of a new language on the block. The good news is that with a little bit of work, you can start using that language from within Komodo!
In the web-programming world, a JavaScript framework called [React](https://facebook.github.io/react/) is growing in popularity. React is “a JavaScript library for building user interfaces”. Since user-interfaces are ultimately specified in HTML markup, creating them in pure JavaScript has always been cumbersome. The React framework has created a custom language called JSX to simplify the process. JSX explicitly allows HTML markup to be specified in within JavaScript. Here’s an example:
~~~
// main.js
var React = require(‘react’);
var ReactDOM = require(‘react-dom’);
ReactDOM.render(

Hello, world!

,
document.getElementById(‘example’)
);
~~~
Note that when you attempt to load a JSX file in Komodo, there are some problems.

Not only is the HTML markup not highlighted correctly, but an inaccurate syntax error is reported.
The next version of Komodo will support proper JSX syntax highlighting, and this was achieved using tools already built into Komodo. In fact, you could create syntax highlighting for React today!
This post will demonstrate just how powerful Komodo’s ability to create syntax highlighting extensions for custom languages is.
## Side Note: Creating a User-Defined Language
Komodo provides the ability to create a “user-defined language” (or UDL for short) out of the box. You can create one by navigating to the “Project” menu and then selecting “New from Template” followed by “Create Komodo Language”. A wizard will prompt you for some details on your language and then create the scaffold for your Komodo extension.
## Luddite
The syntax highlighting for custom languages is specified using a powerful language called Luddite that ActiveState created specifically for this purpose. Luddite allows you to create highlighting for a single language (such as JavaScript) or multiple languages (such as HTML that allows embedded JavaScript and CSS). A reference on Luddite is available in [Komodo’s documentation](http://docs.komodoide.com/SDK/udl).
In brief, Luddite describes a state-machine. Certain patterns recognized within certain states trigger transitions to other states, each with their own recognized patterns for further transitions. Each state usually has its own rules for syntax highlighting chunks of text. For example, encountering `` triggers a transition back to the default markup state (which has no concept of JavaScript syntax, so subsequent text is highlighted as HTML). The topic of state machines is vast and complicated, so you are encouraged to view the link above for more information. The rest of this post will assume you are vaguely familiar with that material.
## Luddite and JSX
JSX consists of JavaScript and embedded HTML. That embedded HTML may also re-embed JavaScript within templating tags. Sound confusing? It is, but Luddite makes short work of this scenario. Not only that, but we can leverage Komodo’s existing HTML and JavaScript syntax highlighting routines without needing to re-invent the wheel for JSX — correct JSX highlighting can be specified in as few as 35 lines of code! As an additional bonus for using existing code, we get embedded CSS in HTML for free.
### Identifying Transition Points
JSX has two transitions: from JavaScript (the default language) to HTML, and from HTML back to JavaScript. Each transition can happen in one of two ways. One JS to HTML and back transition occurs when specifying HTML markup directly in JS. The other round-trip transition occurs when specifying JS within HTML markup via React’s `{{ … }}` templating mechanism.
Here’s an example that contains two instances of the first transition using embedded HTML markup:
~~~
// tutorial1.js
var CommentBox = React.createClass({
render: function() {
return (

Hello, world! I am a CommentBox.

);
}
});
ReactDOM.render(
,
document.getElementById(‘content’)
);
~~~
Note the `

` case and the “ case are inherently different. The first needs to also highlight everything within matching tags as HTML, while the second needs to stop highlighting at the end of the single tag.
For this first transition case, `<` marks the beginning of JS to HTML transition, but only if a tag name occurs immediately after (i.e. no whitespace). Otherwise, comparisons like `a < b` would be mistakenly interpreted as transitions. In Luddite, the JS to HTML transition would look like this:
~~~
family csl
# Look for the beginning of an HTML tag and mark it as a delimiter. In HTML now.
state IN_CSL_DEFAULT:
/<([\w:-]+)/: paint(upto, CSL_DEFAULT), set_delimiter(1), redo => IN_M_DEFAULT
~~~
The English translation of this code is “from within the default JavaScript client-side-language (CSL) state, when you come across text that matches ‘<‘ followed by a word, make a note of that word and transition into the default HTML markup (M) state”. Where do the `IN_CSL_DEFAULT` and `IN_M_DEFAULT` states come from? They are defined in Komodo’s JavaScript and HTML syntax highlighting routines, respectively (“jslex.udl” and “html.udl”).
Now the HTML back to JS transition is more complicated and looks like this:
~~~
family markup
# When in HTML, look for the beginning of a closing tag. The tag will need to be
# checked to see if it is a delimiter.
state IN_M_DEFAULT:
‘</’: paint(upto, M_DEFAULT), paint(include, M_ETAGO) => IN_M_ETAG_JSX
# At the beginning of a closing tag. Check for delimiter. If so, we’ll need to
# confirm it stands alone (i.e. only whitespace or ‘>’ after it). If there is no
# delimiter, fall back on the default HTML handling.
state IN_M_ETAG_JSX:
delimiter: keep_delimiter, paint(upto, M_TAGNAME) => IN_M_ETAG_DELIM
/./: redo => IN_M_ETAG_1 # fall back on the default
# At the beginning of a closing delimiter. Verify it indeed is the delimiter
# we’re looking for and make the transition back to JSX.
state IN_M_ETAG_DELIM:
/\s+/: # stay
‘>’: clear_delimiter, paint(include, M_ETAGC) => IN_CSL_DEFAULT
/./: redo => IN_M_ETAG_1
~~~
The English translation of this code is “from within the default HTML markup state, when you come across ‘</’, transition to an intermediate state that checks to see if what follows is the matching end tag of the original noted tag that began the transition from JS to HTML. If so, go until the tag’s trailing ‘>’ and then transition back to the default JavaScript state. (If not, transition back to the state that handles normal behavior and remain in an HTML state.)”
However, the code above only applies to an HTML tag pair like `

`. For stand-alone tags, this Luddite code is needed:
~~~
family markup
# If the HTML tag is stand-alone, transition immediately back to JSX instead of
# waiting for the corresponding end tag.
state IN_M_STAG_POST_TAGNAME:
‘/>’: paint(upto, M_TAGSPACE), paint(include, M_EMP_TAGC) => IN_CSL_DEFAULT
state IN_M_STAG_POST_ATTRNAME_2:
‘/>’: paint(upto, M_TAGSPACE), paint(include, M_EMP_TAGC) => IN_CSL_DEFAULT
~~~
No English translation is needed here, but it’s worth noting that again, those state names are defined and used within Komodo’s HTML highlighting routines (“html.udl”).
Going back to the second kind of transition (the one of HTML to JS and back via React’s templating mechanism), here’s an example:
~~~
var Comment = React.createClass({
render: function() {
return (

{this.props.author}

{this.props.children}

);
}
});
~~~
The Luddite code to handle this is more straightforward:
~~~
family markup
# When in HTML, look for a ‘{‘, which transitions back to JSX.
state IN_M_DEFAULT:
‘{‘: paint(upto, M_DEFAULT), spush_check(IN_M_DEFAULT), redo => IN_CSL_DEFAULT
state IN_M_STAG_POST_ATTRNAME_2:
‘{‘: paint(upto, M_DEFAULT), spush_check(IN_M_STAG_POST_TAGNAME), redo => IN_CSL_DEFAULT
family csl
# When ‘{‘ transitioned from HTML to JSX, look for the matching ‘}’ to
# transition back to HTML.
state IN_CSL_DEFAULT:
‘}’: paint(upto, CSL_DEFAULT), paint(include, SSL_OPERATOR), spop_check => IN_CSL_DEFAULT
~~~
Note the `spush_check` and `spop_check` UDL keywords take care of ensuring nested curly braces are handled properly.
## Tying It All Together
Now that JSX transitions have been defined, all that remains is to tie everything together:
~~~
language JSX
initial IN_M_DEFAULT_TRANSITION
family csl
# Look for the beginning of an HTML tag and mark it as a delimiter. In HTML now.
state IN_CSL_DEFAULT:
/<([\w:-]+)/: paint(upto, CSL_DEFAULT), set_delimiter(1), redo => IN_M_DEFAULT
# When ‘{‘ transitioned from HTML to JSX, look for the matching ‘}’ to
# transition back to HTML.
state IN_CSL_DEFAULT:
‘}’: paint(upto, CSL_DEFAULT), paint(include, SSL_OPERATOR), spop_check => IN_CSL_DEFAULT
family markup
# If the HTML tag is stand-alone, transition immediately back to JSX instead of
# waiting for the corresponding end tag.
state IN_M_STAG_POST_TAGNAME:
‘/>’: paint(upto, M_TAGSPACE), paint(include, M_EMP_TAGC) => IN_CSL_DEFAULT
state IN_M_STAG_POST_ATTRNAME_2:
‘/>’: paint(upto, M_TAGSPACE), paint(include, M_EMP_TAGC) => IN_CSL_DEFAULT
# When in HTML, look for the beginning of a closing tag. The tag will need to be
# checked to see if it is a delimiter.
state IN_M_DEFAULT:
‘</’: paint(upto, M_DEFAULT), paint(include, M_ETAGO) => IN_M_ETAG_JSX
# At the beginning of a closing tag. Check for delimiter. If so, we’ll need to
# confirm it stands alone (i.e. only whitespace or ‘>’ after it). If there is no
# delimiter, fall back on the default HTML handling.
state IN_M_ETAG_JSX:
delimiter: keep_delimiter, paint(upto, M_TAGNAME) => IN_M_ETAG_DELIM
/./: redo => IN_M_ETAG_1 # fall back on the default
# At the beginning of a closing delimiter. Verify it indeed is the delimiter
# we’re looking for and make the transition back to JSX.
state IN_M_ETAG_DELIM:
/\s+/: # stay
‘>’: clear_delimiter, paint(include, M_ETAGC) => IN_CSL_DEFAULT
/./: redo => IN_M_ETAG_1
# When in HTML, look for a ‘{‘, which transitions back to JSX.
state IN_M_DEFAULT:
‘{‘: paint(upto, M_DEFAULT), spush_check(IN_M_DEFAULT), redo => IN_CSL_DEFAULT
state IN_M_STAG_POST_ATTRNAME_2:
‘{‘: paint(upto, M_DEFAULT), spush_check(IN_M_STAG_POST_TAGNAME), redo => IN_CSL_DEFAULT
include “html2css.udl”
include “css2html.udl”
include “jslex.udl”
include “html.udl”
include “csslex.udl”
family markup
# Immediately hand highlighting off to JavaScript since it starts in markup
# by default.
state IN_M_DEFAULT_TRANSITION:
/./: redo => IN_CSL_DEFAULT
~~~
As you can see, we’re making use of Komodo’s existing UDLs for HTML, CSS, and JavaScript. All that’s needed for JSX support is to “insert” states and transitions in the right places. The only remaining point worth noting is that last `IN_M_DEFAULT_TRANSITION` state. It looks odd. Why is it there? Why can’t we use `initial IN_CSL_DEFAULT` at the top? By design, multiple language syntax highlighters start in the markup language (in this case, HTML). In order to begin highlighting in a sub-language like JavaScript (the `IN_CSL_DEFAULT` state), you have to create a “pseudo-state” that immediately transitions into the desired starting state.
![JSX Syntax Highlighting Screenshot – Komodo IDE](/assets/images/contentful/jsx.png)
That’s all there is to it!

Recent Posts

Scroll to Top