Skip to content

Warn on quoted attributes #1601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jul 9, 2025

Conversation

dtpowl
Copy link
Contributor

@dtpowl dtpowl commented Jul 2, 2025

For #1599.

The old version of the parser allowed entity field attributes to be wrapped in quotation marks; the enclosing "s would get parsed away and the enclosed text passed through in the attribute text on the ParsedEntityDefinition.

I discussed this with @parsonsmatt , and we'd like to deprecate this behavior. In #1599 it seems to have been in use as a workaround for a parser bug that no longer exists. (If it's necessary to escape whitespace in a field attribute, this can still be done by wrapping the attribute in ( ) instead of quotes.)

This PR restores the old behavior and adds a configurable deprecation message as a warning. It's not easy to do this without some further refactoring of the parser. Banning quotes in entity field arguments will also ban them in entity field types, which would be incorrect — field types can include typelevel string literals.

This is a good opportunity to make the parser smarter. It's currently pretty naive — essentially, it breaks each line of an entity definition block into tokens and then stops. In doing this, it ignores a lot of genuine syntactic data. It doesn't know that the line starts with the field name, is followed by a type, and is then followed by a series of attributes. It's very easy to write a field definition that parses successfully but fails semantically.

Improving this situation is an incremental directional step towards a formal specification for the language.

This PR:

  • Implements a more exact parser for entity field definitions, including structural representation of the field name, type, strictness, and attributes.
  • Separates parsing of entity fields from non-field things like deriving statements. I'm calling these "directives" here. In the interest of keeping the PR from growing even more huge, directives are parsed naively for now.
    • I'd like to add support for Haskell-style deriving syntax in the future; this PR will make it easier to do that.
  • Temporarily restores the old parser's behavior for entity field definitions and directive arguments that are wrapped in quotes.
  • Adds a configurable warning message when entity field definitions and directive arguments are wrapped in quotes.

Currently, all of this new structure is thrown away in mkUnboundEntityDef. There's something of an impedance mismatch:

  • mkUnboundEntityDef doesn't care about the difference between field definitions and directives; it wants the parsed structure to be reduced back to a string of tokens.
  • mkUnboundEntityDef wants to re-parse types itself.
  • mkUnboundEntityDef wants to re-parse key/value attributes itself.
  • etc, etc, etc

I think this situation could be improved with some effort, but that's outside the scope of this PR.

@@ -186,7 +186,7 @@ User sql=big_user_table
This will alter the generated SQL to be:

@
CREATE TABEL big_user_table (
CREATE TABLE big_user_table (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated typo fix

@@ -784,9 +784,9 @@ check = do
convert
:: (Entity Vehicle, Maybe (Entity Bicycle), Maybe (Entity Car))
-> Vehicle'
convert (Entity _ (VehicycleBicycleSum _), Just (Entity _ (Bicycle brand)), _) =
convert (Entity _ (VehicleBicycleSum _), Just (Entity _ (Bicycle brand)), _) =
Bike brand
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two unrelated typo fixes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol vehicycle

, entityUniques = entityConstraintDefsUniquesList entityConstraintDefs
, entityForeigns = []
, entityDerives = concat $ mapMaybe takeDerives textAttribs
, entityDerives = concat $ mapMaybe takeDerives (textFields ++ textDirectives)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid changing the external interface, we'll just smush the field definitions and directives together for now. We should improve this in the future.


entityConstraintDefs =
foldMap
(maybe mempty (takeConstraint ps entNameHS cols) . NEL.nonEmpty)
textAttribs
(textFields ++ textDirectives)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid changing the external interface, we'll just smush the field definitions and directives together for now. We should improve this in the future.

@@ -211,19 +211,18 @@ tryOrWarn msg p l r = do
else parseError err

-- | Attempts to parse with a provided parser. If it fails with an error matching
-- the provided predicate, it registers a delayed error with the provided message and falls
-- the provided predicate, it registers a delayed error and falls
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was inaccurate; the msg argument was actually unused. I'm not sure that's what we actually want, but fixing it seemed outside the scope of this PR.

tokenContent = \case
Quotation s -> s
Equality l r -> mconcat [l, "=", r]
attributeContent :: Attribute -> Text
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these fooContent methods exist in order to reserialize parsed data to text so it can be parsed again during code generation. I think this is silly, but removing it will be a pretty big refactor.


commentContent :: CommentToken -> Text
commentContent = \case
Comment s -> s
DocComment s -> s

quotedAttributeErrorMessage :: String
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We define a custom error message mostly so that we can fish it out again in tryOrReport, because it's otherwise not as easy as you'd think to determine when the quoted-attribute-forbidding parser has failed because of a quoted attribute as opposed to some other reason.

I don't love this but it works fine for now; a better solution would probably add 100+ LOC to this already-big PR.

deriving (Show, Eq)

-- | Parses a Persistent-style type expression.
-- Persistent's type expressions are largely similar to Haskell's, but with a few differences:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't be surprised if, while I'm testing packages that depend on persistent, I find some places where people are depending on a less restricted set of types than this parser admits. I'll make fixes here as needed, but this is one reason why it would be really good to have a formal specification for the DSL.

@dtpowl dtpowl marked this pull request as draft July 2, 2025 21:08
@dtpowl dtpowl force-pushed the warn-on-quoted-attributes branch from 532e65d to bf5dd20 Compare July 2, 2025 22:36
@dtpowl dtpowl force-pushed the warn-on-quoted-attributes branch from 034a9bb to 10515fc Compare July 3, 2025 23:36
@@ -784,9 +784,9 @@ check = do
convert
:: (Entity Vehicle, Maybe (Entity Bicycle), Maybe (Entity Car))
-> Vehicle'
convert (Entity _ (VehicycleBicycleSum _), Just (Entity _ (Bicycle brand)), _) =
convert (Entity _ (VehicleBicycleSum _), Just (Entity _ (Bicycle brand)), _) =
Bike brand
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol vehicycle

Comment on lines 62 to 63
typeExpr' :: ((MonadParsec e String) m) => Bool -> m TypeExpr
typeExpr' isInner = label "type expression" $ do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recommend creating a sum type instead of using bool here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right; that's nicer

@dtpowl dtpowl requested a review from parsonsmatt July 7, 2025 16:50
@dtpowl dtpowl marked this pull request as ready for review July 7, 2025 16:55
Copy link
Collaborator

@parsonsmatt parsonsmatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM! Let's put a changelog/version bump in the relevant places (ie persistent-qq, persistent-test were both modified). patch should be fine for persistent-qq, minor bump for persistent-test

--
-- @since 2.17.1.0
typeExprContent :: TypeExpr -> Text
typeExprContent = typeExprContent' False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reuse IsInner here?

Comment on lines 128 to 146
describe "type parsing" $ do
let
parseType :: String -> ParseResult TypeExpr
parseType s = do
let
(warnings, res) = runConfiguredParser defaultPersistSettings initialExtraState innerTypeExpr "" s
case res of
Left peb -> (warnings, Left peb)
Right (te, _acc) -> (warnings, Right te)

isType typeStr expectedTypeExpr = do
let (_warnings, Right te) = parseType typeStr
te `shouldBe` expectedTypeExpr
typeExprContent te `shouldBe` T.pack typeStr

-- these are some helper functions to make expectations less verbose
simpleType s = (TypeApplication (TypeConstructorExpr (TypeConstructor s)) [])
typeApp s ts = (TypeApplication (TypeConstructorExpr (TypeConstructor s)) ts)
listOf t = (TypeApplication (TypeConstructorExpr ListConstructor) [t])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this appears to be mixing 2 and 4 space indent, run fourmolu pls?

(For now, let's put {- fourmolu disable -} comments in the files that are failing this

Comment on lines 147 to 166

it "parses types of kind '*'" $ do
"String" `isType` simpleType "String"

it "parses type constructors with dots" $ do
"ThisIs.AType" `isType` simpleType "ThisIs.AType"

it "parses higher-kinded types" $ do
"Maybe String" `isType` typeApp "Maybe" [simpleType "String"]

it "is greedy when parsing arguments to a type constructor" $ do
"Map String Int" `isType` typeApp "Map" [simpleType "String", simpleType "Int"]

it "parses higher-kinded types when parameterized by complex types (1)" $ do
"Map String (Maybe [Int])" `isType`
typeApp "Map" [simpleType "String", typeApp "Maybe" [listOf (simpleType "Int")]]

it "parses higher-kinded types when parameterized by complex types (2)" $ do
"Map (Maybe Int) [Int]" `isType`
typeApp "Map" [(typeApp "Maybe" [simpleType "Int"]), listOf (simpleType "Int")]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice tests

Comment on lines 561 to 565
c <- char '!' <|> char '~'
case c of
'!' -> pure Strict
'~' -> pure Lazy
_ -> error "unreachable"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can avoid error here

Suggested change
c <- char '!' <|> char '~'
case c of
'!' -> pure Strict
'~' -> pure Lazy
_ -> error "unreachable"
(Strict <$ char '!') <|> (Lazy <$ char '~')

<$ is an operator that does:

(<$) :: (Functor f) => a -> f b -> f a
a <$ fb = fmap (const a) fb

Or, monadically,

a <$ fb = do
  _ <- fb
  pure a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, lovely

Comment on lines 820 to 839
entityField :: Parser Member
entityField = do
dcb <- getDcb
pos <- getSourcePos
ss <- optional fieldStrictness
fn <- L.lexeme spaceConsumer fieldName
ft <- L.lexeme spaceConsumer typeExpr -- Note that `typeExpr` consumes outer parentheses.
fa <- optional $ L.lexeme spaceConsumer (many attribute)
_ <- setLastDocumentablePosition
lookAhead (void newline <|> eof)
pure $
MemberEntityField
EntityField
{ entityFieldDocCommentBlock = dcb
, entityFieldStrictness = ss
, entityFieldName = fn
, entityFieldType = ft
, entityFieldAttributes = fromMaybe [] fa
, entityFieldPos = pos
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that's a pretty parser 👏🏻

@dtpowl dtpowl force-pushed the warn-on-quoted-attributes branch from 27dc131 to 6e595e5 Compare July 7, 2025 19:54
@dtpowl dtpowl requested a review from parsonsmatt July 7, 2025 21:05
@parsonsmatt parsonsmatt merged commit 44adecb into yesodweb:master Jul 9, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants