The government as well as corporations is promoting research and development (R&D) of new growth engines that are internationally competitive to overcome the ongoing global economic downturn. To do this, they set the direction of the R&D by using some patent trend analysis from the stage of planning and evaluation of the R&D in order to create valuable patented technology with international competitiveness.
Such a patent trend analysis, however, is a time-consuming and error-prone task because it requires patent researchers to manually examine the extracted candidate patent documents one by one and to understand the patent technology out of their expertise. This is a serious problem.
In this dissertation, we propose a method for extracting core patent documents using information retrieval and machine learning. The method contains three steps: 1) extract valid patent documents from retrieved patent documents using a patent search service; 2) classify the valid patent documents into sub-technology categories; 3) finally extract core patent documents from valid patent documents classified by sub-technology categories. The first step ranks retrieved patent documents to obtain valid patent documents for a given queried technology by cosine similarity between the vector of each retrieved patent document and that of the technical summary as the queried technology. The second step classifies valid patent documents into sub-technology categories using a five layered neural network, of which the input is TF-IDF weights and technology-related weights for each valid patent documents. The final step extracts core patent documents from the valid patent document classified by sub-technology categories. In detail, valid patent documents is ranked by linear combination of patent feature values (for instance, impact factor, the number of family nations, cosine similarity, and so on) and a patent feature priority.
For the evaluation, we analyzed patent trends on radiopharmaceuticals as an example. The patent search service retrieved 4,603 candidate patent documents for a technical summary as a queried technology. We compared the results of the proposed system and those obtained manually by a patent investigator in time and accuracy. First, in the execution time, it takes 13,095 minutes to perform manual operations, while the proposed system performed the same operations for 134 minutes. It is 97 times as fast as the manual operations can. And the proposed system have shown the accuracy of 86.88% for extracting valid patent documents, the accuracy of 91.08% for classifying into detailed technology categories, and the accuracy of 75.76% for extracting core patent documents.
Consequentially, we have shown that the proposed system is effective because it helps patent researchers to save the time and to reduce the errors. In the future, we will improve the performance of the proposed system in accuracy using a cutting-edge technology like deep learning and apply to several areas except radiopharmaceuticals.