Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VRMLLoader: Update the token pattern for identifier to load VRML files from KiCad. #27543

Merged
merged 4 commits into from
Jan 12, 2024

Conversation

liangtie
Copy link
Contributor

@liangtie liangtie commented Jan 11, 2024

char '-' is allowded to be used inside the identifiers for footprints in Kicad , e.g. 'PIN-01' , 'IC-LABEL-01'

DIP-8_W8.89mm_SMDSocket.wrl.json

…Kicad

char '-' is allowded to be used inside the identifiers for footprints in Kicad , e.g.  'PIN-01' , 'IC-LABEL-01'
@liangtie liangtie changed the title fix: Update the token pattern for Identifier to load VRML files from … Jan 11, 2024
@liangtie liangtie closed this Jan 11, 2024
@liangtie liangtie reopened this Jan 11, 2024
@liangtie
Copy link
Contributor Author

http://gun.teipir.gr/VRML-amgem/spec/part1/grammar.html

Id:
IdFirstChar
IdFirstChar IdRestChars
IdFirstChar:
Any ISO-10646 character encoded using UTF-8 except: 0x30-0x39, 0x0-0x20, 0x22, 0x23, 0x27, 0x2c, 0x2e, 0x5b, 0x5c, 0x5d, 0x7b, 0x7d.
IdRestChars:
Any number of ISO-10646 characters except: 0x0-0x20, 0x22, 0x23, 0x27, 0x2c, 0x2e, 0x5b, 0x5c, 0x5d, 0x7b, 0x7d.

'-'( 0x2d ) is allowed in "IdRestChars" by the VRML spec

@Mugen87
Copy link
Collaborator

Mugen87 commented Jan 11, 2024

For testing: https://rawcdn.githack.com/Liangtie/three.js/dev/examples/webgl_loader_vrml.html

We have tried implementing this some time ago but we had to revert since other VRML assets failed loading with the change. Original PR: #20321.

At that time, 0x2d was directly used in the regex. I see you are doing it differently now. Do you mind explaining why we can't use 0x2d which seems a bit cleaner than ([-]??

@Mugen87
Copy link
Collaborator

Mugen87 commented Jan 11, 2024

I still wonder why the PR introduces the additional brackets and question mark. Why isn't it possible to treat 0x2d like all other tokens?

@liangtie
Copy link
Contributor Author

A minimal demo to reproduce the issue :

import chevrotain from '../libs/chevrotain.module.min.js';

const createToken = chevrotain.createToken;

const RouteIdentifier = createToken( { name: 'RouteIdentifier', pattern: /[^\x30-\x39\0-\x20\x22\x27\x23\x2b\x2c\x2d\x2e\x5b\x5d\x5c\x7b\x7d][^\0-\x20\x22\x27\x23\x2b\x2c\x2d\x2e\x5b\x5d\x5c\x7b\x7d]*[\.][^\x30-\x39\0-\x20\x22\x27\x23\x2b\x2c\x2d\x2e\x5b\x5d\x5c\x7b\x7d][^\0-\x20\x22\x27\x23\x2b\x2c\x2d\x2e\x5b\x5d\x5c\x7b\x7d]*/ } );
const Identifier = createToken( { name: 'Identifier', pattern: /[^\x30-\x39\0-\x20\x22\x27\x23\x2b\x2c\x2d\x2e\x5b\x5d\x5c\x7b\x7d]([^\0-\x20\x22\x27\x23\x2b\x2c\x2d\x2e\x5b\x5d\x5c\x7b\x7d])*/, longer_alt: RouteIdentifier } );
const lexer = new chevrotain.Lexer( [ Identifier, RouteIdentifier ] );

const lexingResult = lexer.tokenize( 'PIN-01' );
console.log( lexingResult );

it will get the following output :

$ node test.js 
Warning: Warning: No LINE_BREAKS Found.
        This Lexer has been defined to track line and column information,
        But none of the Token Types can be identified as matching a line terminator.
        See https://chevrotain.io/docs/guide/resolving_lexer_errors.html#LINE_BREAKS
        for details.
{
  tokens: [
    {
      image: 'PIN',
      startOffset: 0,
      endOffset: 2,
      startLine: 1,
      endLine: 1,
      startColumn: 1,
      endColumn: 3,
      tokenTypeIdx: 4,
      tokenType: [Object]
    }
  ],
  groups: {},
  errors: [
    {
      offset: 3,
      line: 1,
      column: 4,
      length: 3,
      message: 'unexpected character: ->-<- at offset: 3, skipped 3 characters.'
    }
  ]
}
@Mugen87
Copy link
Collaborator

Mugen87 commented Jan 12, 2024

After studying #20321 again and this PR I think the situation is as follows:

At the beginning, the loader was strictly developed according to the specification of http://gun.teipir.gr/VRML-amgem/spec/part1/concepts.html#SyntaxBasics. That means - is not allowed in field, event, prototype, and node names (or short "identifiers").

We have agreed to make an exception to better support KiCad (see #18298 (comment)) but #20321 failed because it removed \x2d at the wrong position. The first part of the regex only targets the first character and we must ensure - does not appear at this place in identifiers. However, they are allowed to appear "inside" identifiers so updating the second part of the regex is correct. In this way identifiers like PIN-01 or IC-LABEL-01 should work.

I have verified all test files in our webgl_loader_vrml example and they work with the current state of the PR. I just hope something like PIN-01- won't cause any issue (meaning when a - appears as a last character). But I think it's best to find this out by using the change in production. If there are any issues, we can easily update the regex again.

@Mugen87 Mugen87 merged commit 6c7cf52 into mrdoob:dev Jan 12, 2024
11 checks passed
@Mugen87 Mugen87 added this to the r161 milestone Jan 12, 2024
@Mugen87 Mugen87 changed the title fix: Update the token pattern for Identifier to load VRMLs from Kicad Jan 12, 2024
AdaRoseCannon pushed a commit to AdaRoseCannon/three.js that referenced this pull request Jan 15, 2024
…mrdoob#27543)

* fix: Update the token pattern for Identifier to load VRML files from Kicad

char '-' is allowded to be used inside the identifiers for footprints in Kicad , e.g.  'PIN-01' , 'IC-LABEL-01'

* style: Replace '-' with '0x2d' in the pattern matching the identifier for vrml

* Revert "style: Replace '-' with '0x2d' in the pattern matching the identifier for vrml"

This reverts commit 11a4c67.

* Remove '0x2d' from the negation set
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants