MVP CLI
Names:
- magic search (more catchy for search)
- code query (the best for set of tools)
- code magic (taken :/)
- magic code search (kinda too long)
- Quecode
- CodeQ
- CodeQue
✅ Fix bug with <$>$</$>;
matching too much - JSX text wildcard acts like $$ o.O
✅ restrict more than 2 wildcards on query parse level
✅ Adjust formatting of multiline code that is staring after some tokens
✅ Make CLI a product
- ✅ codeframe from babel
- ✅ investigate results formatting query :
<Text $="ellipsis" ></Text>
- how we can present original code instead of generated one
- ✅ fix problem with 0 padding
- ✅ commander
- ✅ spinner while search
- ✅ results limit param
- ✅ convenient multiline input
- ✅ find better tokenizer (fixed js-tokens)
- ✅ file path query
- ✅ runs in cwd
❌ Explore types matching and types literals ->
tests on custom file
❌ Bug with JSX text compare -> found only once instead of two
<Text
textAlign="center"
mb={4}
fontWeight={600}
lineHeight={1.3}
fontSize="xl"
>
Please complete the short and completely confidential personality
questionnaire.
</Text>
❌ Try to make type declarations optional in include mode, right now if code has types, eg return type, it cannot be found without it
❌ Add search by files changed since last commit
✅ Investigate why node_modules search does not work
❌ add tests block statement search (queue example in catch block and function block)
✅ parse errors should not crash whole search
✅ see if async can speed up files search
✅ Market research on eslint and babel auto plugins
✅ Try
tsc --extendedDiagnostics
How long stuff took and how big/complex it is.tsc --listFiles
List of every file included in the compilation.tsc --explainFiles
List of every file and why it's included.- https://github.com/amcasey/ts-analyze-trace
❌ add search by dependencies using dpdm
✅ Think if we can solve problems with exports using rev-dep
- how to get root files in rev-dep
- how many roots it would find
- look for dependencies analysis extensions
- how we could use rev-dep in vscode ext
- I NEED babel-plugin-undo-reexport
✅ ensure create debug and other trash is not in pkg, install fresh pkg on linux to save space
✅ Bug with code generation for <$ $={
${$$}} />;
// use-case: probably redundant template literal (value can be not a string)
- implement tests for this
- implement tests for
<$ $={
fdsgg} />;
// use-case: redundant template literal - template literal seems to not work properly in exact mode
✅ Support wildcards in JSXText
✅ Support for case insensitive search
- only for wildcards for now
- actually it might be easy, we should check if primitive value is string
✅ Support json
✅ Bundle/minify/obfuscate
❌ Invent / Implement license mechanism
-
✅ try webassembly
-
✅ Cleanup rust code
-
✅ cleanup rust deps
-
✅ cleanup build chunks
-
✅ add wasm files to package script
-
✅ obfuscate wasm identifiers
-
RSA-SHA256 License
- ability to seamless renewal
- ideally can validate license with public key, but create with private key
- non-ideally - both keys (or one common key) is used to create and validate
- license is stored on device and not removed on update
- ✅ consider calling RSA crypto from rust
- key can be stored on js or rust
- result can be get in rust
- to intercept attacker would have to modify / override crypto impl
- Consideration result: Let's use AES for now and stop overthinking this security :D it might not be the issue if software would not sell well
- investigate how to integrate tools like Paddle, Strip, Gumroad, Kofi for payments / memberships
-
temp key: dSgVkXp2s5v8y/B?
-
❌ local license key store
- to survive lib update
- to survive vscode update
- to survive vscode ext update
- need to save in user home directory
- need to find package to handle that
- we can use os.homedir() and .codeque file
-
💡 cli set license
-
cli authorize via github (later)
-
One key can be shared between many users -> company key does not make sense if they don't want to use eslint
- maybe we can generate key with device footprint, then we could validate footprint
- footprint could be embeded in signature, so we need footprint either from JS or from Rust
- footprint could be sha2 from some os properties - this could be reliable
- one license = up to 3 active footprints
- can we use sha identity for this?
- we can, it would be usefull for CI servers
- for human users we would require to sign in with github, which would return license key
- some can still figure out that key is stored locally and they can copy it
- if we replace key frequently, that wouldn't be worth cheating
- footprint is a good idea
- sha for CI could be used by some users to get access to search for many ppl of the company
- sha access could be granted for 1h or some short-period of time
- license key is same as it is now
- how we can register footprint from CI server running eslint checks, would that be stable ?
-
Each account could have it's own .wasm generated with custom key
- problem with versioning/updates
- we could generate wasm build on the fly if needed
- wasm build could be loaded async and cached
- act like a 2 factor auth. needs matching key and lock
- it would kind of secure flaky AES on wasm
- how we would fetch proper .wasm ?
- organization id + user id/email + npm pkg version
- anyone can (not easily) fetch some .wasm
- if some one fetch .wasm, they need to decompile to get AES key
- if they have AES key and .wasm they can generate key and use software
- decompile of .wasm to get AES would be different for every user and version
- harder than just one AES key for all versions and users
- cost of generation of .wasm assuming 10k customers and 5 minutes per build and 512RAM ~ $25 // 0.0000000083 * 1000 * 60 * 5 * 10000
- assuming we have container with rust installed - should be possible - need PoC
-
✅ Each version/build to have different AES key?
- what are the implications ?
- user would have to change key with each new version (we can add postinstall step)
- we could verify if user even can have key for this new version (safer than checking dates on local machine)
- a key still can be shared among many ppl, but due to updates (auto updates in vscode!) it would be frequently replaced
- if we add fingerprint that would be safe enough, cannot easily copy-paste key
- we could do nightly builds to force to replace key more often
- each key get request would give you new, one-time refresh token
- impossible to share refresh token with others
- harder to generate fake license (needs to deassembly key every time)
- what's the purpose of generating this key if we would have to use refresh token to get it?
- software features are locked until you get the key
- having a refresh token does not mean that you will be able to get a key (might have outdated account)
-
What if we would generate key on user device
- we would have to generate .wasm on demand
-
✅ Will partial .wasm impl be maintainable?
- let's do not overcomplicate wasm part
- some really greedy cheaters would just lose their time
- blocking key copy-paste is good enough - we will use fingerprint
- maybe we should build just JS on demand in the cloud?
- ✅ how we differentiate operations like search, eslint, replace on wasm side and still having nice API
- remember codeQue can be used as a npm module
- wasm would have to control the flow of the program - pain in the ass ?
- we would have just different functions to do different things
- maybe we can somehow pass current stack trace to authorize xD ?
- if someone would try to overuse regular license to have company/project features - we don't care
- we can obfuscate license checks, so it's harder to use "search" check in place of "eslint" check
-
License v1.0 - alpha
- shared AES key and on demand 6 months license gen
-
License v1.1 - beta
- each release changes the AES key
- license generated using account on server (auth via github)
- device fingerprint
- each license key valid for device & version, github auth/my server refresh token to refresh key
-
License v2 - with version for companies (eslint etc)
- fingerprint
- unique AES key for each organization/user
- cloud builds of .wasm
- sha keys for CI
To release vscode ext
- ✅ figure out how to store key in user home dir
- ❌ implement storing key in home dir
- ❌ implement module API
- ❌ release npm alpha pkg protected by AES key
- ❌ vscode ext implementation
- list features like
- select code to search
- include / exclude files dirs
- mimic normal search
❌ PoC / Implement vscode extension - mostly to understand how to license
- MVP needs to be vscode extension, cli is not convenient for users
✅ Add support for proposal syntaxes
✅ Add support for multiple wildcards
($$, $$) => {}
is invalid while parsing function$_refN
- currently without ref analysis$$_refN
- currently without ref analysis
✅ Implement tests
✅ Add literal wildcards
- string literal cannot be replaced with identifier in some scenarios eg import
- we should be able to always use identifier wildcard in place of number
- we still need number wildcard for some cases (we want to have number, not any identifier)
✅ Add support for regexp identifier matches (on$ -> onClick, onHover etc)
✅ Better handling of query errors
- return outside a function
- await outside async fn
- explore parse result errors
✅ Regex matching of identifier seems to be slow
- ✅ one perf issue was caused by prettier - fixed!
- double the time on mac for
"import { $Plus } from 'react-icons$'"
- maybe instead of
"."
regex we could be more specific - ✅ it might be caused by lack of keywords for initial search
- try to use keywords regexes in tokens search
- ✅ try to escape
"$"
from tokens - should be faster than several regex - try to use language keywords like import, for,as
✅ improve query parsing
- first try to parse without brackets, then add brackets and parse once again
✅ Add support for nested gitignore
✅ Do benchmark (done)
- mac 1.4s
- desktop 2.6s
- laptop 4.5s
✅ Do profiling
- maybe we can optimize by identifiers search
- probably there is amount of identifiers that we can search to gain time,but if we search for too many, we will lose time
- just one identifier is a good starting point
Get files edited since last commit echo $(git diff --name-only HEAD)
❌ Notion this Readme !
❌ Think of strategy
- 1st make a tool and test it within friends and Dweet
- 2nd start youtube channel / blog / your other media here and speak about tooling, bundling etc
- make a list of videos with ToC that I would like to record
Further product development
💡 Feature import-based search
- search in file and all files imported by a file
- eg. your test failed
- you search for test based on name
- you specify a query to find failing code patterns in files imported by test
💡 Think of negation syntax and sense (just to make if future proof for now)
- could be something like:
$not('asd')
- it might execute 2 (or more) searches and filter results if there are 2 the same
💡 Think of and, or syntax and sense (just to make if future proof for now)
- could be something like:
$and('asd', $not(() => {}))
- jsx excluding some prop
$and(<somejsx>, $not(<somejsx prop={$$}/>))
💡 Think of support for ref matching
- user should be able to indicate that two wildcards are the same identifier
- eg.
const $_ref1 = 'string'; call($_ref1)
💡 Add query extensions
$type()
- to create type matcher
- can be only used top-level
$exact(), $include(), $includeWithOrder()
- to change mode in given code path
- <$ $={() => {}} /> will match functions with body, which we don't want
$fn(() => {})
- alias for 3 types of function definition
- effectively executes 3 queries
- It might be useful to search for expressions within nested structures inside functions to make it more useful
- it might need special operator like
$nested()
$jsx()
- for jsx tags when children can be ignored
- executes 2 query for self-closing and not self closing
💡 Think of other use cases for the matching functionality (call the whole product code-magic)
- should the product be an licensed cli ?
- vscode search extension
- other editors extensions (how to, which languages)
- Webstorm
- Java, but can execute JS somehow - need more reading
- cli search - why not
- standalone desktop app
- eslint plugin restricted syntax
- check in autozone if custom plugins could be replaced
- check which of the existing plugins could be replaced
- plugin should have reference analisys (user should be able to mark that two identifiers should be the same, eg using
$_ref1
) - there might be a problem to show error in specific line where it happens, since we usually need to outline more context in query to capture the problem.
- Market research
- automated codemod - this one needs a PoC
- check some codemods
- program should be able to get diff of AST
- 3 steps
- implement query
- implement transformed query
-> generate AST diff and use it as a transform (try use json-diff with removed misc keys)
- show example result
- predefined codemode snippets to apply on file
- eg. transform props into
1{prop1, prop2}
based on which keys are used - a) it could be eslint plugin / no need for code-magic for that
- b) it might be impossible to implement with current approach to codemod
- for codemod and eslint we need to be able to reference a variable by identifier, to be able to track references for more complex cases
- track duplicated code - how (eg. pattern to match all DB queries, then exact compare of AST)
- this could be integrated into editor, so it could search duplicates as you type code
- predefined patterns to find in current file
- if pattern is found in given file, search for exact code in other files
- metrics: project has 1000 DB queries, project has 3000 react components
- check what SonarQube can measure
- tool like rev-dep could be part of code-magic toolset
- think how it could improve refactoring
- it should not only resolve imports, but references in code as well, so it would be more accurate (should resolve like stack trace)
- it helps with
- refactoring & finding all refactored views to test them
- saves a lot of time spent on manual references lookup
- Feature: get all values of given property
- eg. to assert unique test-ids across all files
- Feature import-based search
- search in file and all files imported by a file
- eg. your test failed
- you search for test based on name
- you specify a query to find failing code patterns in files imported by test
- Feature - get unique values of
$_ref/$$_ref
in query - Feature: ast-based diff to outline what actually changed in code logic
- needs more reading on how to integrate that into git
- Tool : "Import hygiene" - dependency graph summary and statistics, assertions
- need research on how it could improve codebase on daily basis
- need research how to present information in consumable way
- sort imports to solve css ordering problem
- make assertions to not import certain file in certain paths
- like file with api keys that should only be on server side
- a given file should have list of allowed entry points
- some data on how convoluted and hard to maintain your dependency graph is
- mostly for myself so I know if project is in a good shape
💡 Add support for suggestions based on equivalent/similar syntax
- user input:
<$ prop={"5"} />
, suggestion: <$ prop="5" />
- user input:
<$ prop={$+$} />
, suggestion: <$ prop={$-$} />
💡 Add hints based on first node
- user input:
{a:b}
, hint: You probably needs ({a:b})
, right now it is a block statement - use input
"some string"
: You probably needs ("some string")
, right now it is a directive
💡 To secure the code we should
- verify license in WASM
- implement parts of the algorithm in WASM
- implemented parts do not work if license is not verified
💡 Add support for flow
- Probably needs a refactor similar to different language refactor
- maybe we could look for
@flow
comment and configure babel based on that
💡 Pricing
- if fingerprinting is added, each seat can have 3 fingerprints
- Free only exact mode, no wildcards, no other features
- Paid $19 / year (dev)
- search with all features
- code stats
- exclude replace, ref analysis, import resolution
- Paid $29 / year (pro)
- search + replace + ref analysis + import resolution
- code stats
- Company/project $29 / month
- up to 10 users (+$3 for each additional user)
- limiting users amount does not make sense, since we cannot validate that
- we can if each user would have unique key + on CI we would use ssh identity to receive license key
- all the above + eslint rules
💡 Product website
- home
- docs
- playground
- examples
⌛ Code smells check script
- more than 5 if statements in the block
- spread overuse
- react literal prop values
- some others based on common eslint rules
⌛ Marketing Implement stats script and encourage ppl to share their results on Twitter
- N files
- N JS/TS files
- N import statements
- N require statements
- N string literals
- N empty strings
- N zeros
- N functions
- N arrays
- N objects literals
⌛ Marketing use-cases for search
- You want to find how a component is used across the codebase to see examples without going to docs
- You want to track where the piece of code is duplicated across the codebase
- You spot a code pattern that can cause issues (eg. react falsy event listener) and you want to check where else it is used
- You want to check places where a component with specific set of props is used while refactoring to test changes properly
- You are curious about usage statistics of some patterns in the codebase (count of components, functions)
- there could be predefined set of measurements to run
- More specific
- You are adding i18n and you want to check where in codebase a specific text string is used
⌛ Marketing use-cases for eslint rule
⌛ Marketing use-cases for eslint codemod
- "Pay tech debt quicker"
- after changing prisma data model we can find interface/class for changed entity and adjust fields
💡 Variable's binding tracking needed for refactoring (replace)
- don't compare scope for non $_ref identifiers
- check scope only for refs
- find a way to track scope of the ref
- remember scope of the first ref occurrence
- match ref for every identifier in subtree
- mark if identifier is redeclared
- create scope analysis object
- don't start scope analysis if identifier is not declared only once in query
- if there is identifier redeclaration that is not a part of query stop checking further
- to start working on it we need to have multi body statements queries (extensively concidered in search & replace test file)
Some use cases for replace fn
From
import React from 'react'
$nested(
React.useState($$)
)
To
import React, {useState} from 'react'
$nested(
useState($$)
)
From
<Box>
<Inner prop="val"/>
</Box>
To
<Inner prop="val"/>
From
<>
{isMobile && (
<SetInitialPageType
pageType={pageType}
setPageType={setPageType}
/>
)}
</>
To
<>
<SetInitialPageType
pageType={pageType}
setPageType={setPageType}
isMobile={isMobile}
/>
</>
AST builder and AST finder - JScodeshift helpers
/**
* How to replace?
*
* 1. We have to treat each body item in query as a sub-query, join them with logical 'AND'
* - sub-queries would be required only for non-exact mode
* - we should treat all block nested block statements similarly to sub-queries
* - we cannot relay on index number e.g to delete a node from body, since actual file body might have more elements than query
* 2. For each sub-query we generate hash, and we add that hash into file AST to mark node as "to replace" and link to the sub-query
* 3. For each replacement sub-query we generate the same hash (hash has to be deterministic)
* 4. We generate a diff for each sub-query
* - diff should be agnostic to custom matchers ($nested, $jsx etc.)
* - think of cases where it wouldn't be, maybe some $or $and $not ??
* - there should be a rule that you cannot change custom matchers in the "replace"
* - same applies to wildcard matches
* - maybe somehow we could support replacements in partial wildcards
* - like `some-path/$` to `new-path/$`
* - that's for later, can be done with regex, seems not be often used
* - There are problems
* - what if a given subquery would be totally removed?
* - maybe we should have remove()
* - what if a given subquery is totally new?
* - maybe we should have $add()
* - similar problem occurs in nested block statements
* 5. Once we have a diff (to add, to delete, to update) we should find node in file with hash matching the subquery hash
* 6. We start to traverse the code to apply diff changes
* - since we are based on the diff we don't touch any props that are outside of the search match
* - we should take into consideration, that there would almost always be nodes nested to our query
* - eg. nested JSX in our diff which we cannot remove, some object expressions
*
* Important note: Implementing replace for non-exact mode would be super hard
* - we would need custom diff algorithm to detect removal of some intermediate nodes
* - eg. removing <Box> from <Flex><Box><Text>Abc</Text></Box></Flex>
* - if <Text> contains some additional props not listed in query, we would just remove them with deep-object-diff based approach
* - need to think of how to reconcile that kind of change
* - needs custom diff algorithm that would traverse removed path to find a node with matching shape
* - matching shape means matching identifiers ??? maybe similar to how validateMatch works for different types
* - maybe it should run another deep-object-diff if it would find node with matching type
* - that could work
* - we could call it 'replace with linked node' or 'short circuit node' or 'remove intermediate node'
* - case with removal of an if () {} but keeping part of block content.
* - we should handle that so we need to check if removal was done in block context (or maybe more generic in nodes array)
* Next steps
* 1. Implement a PoC with just one sub query for exact mode
*/
Code duplication research
- bookmarks folder code duplication research
- video summary in ./Movies/code-duplicates.mp4
✅
❌
⌛
💡