How To Make Tasty Vegan Tempeh Chili Recipe For Vegetarians?

Homemade vegan chili with each tempeh and beans suggests that this chili instruction packs a powerful protection boost and includes a satisfying texture. Since the tempeh bubble farewell during this…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




How I did a Basic Feature Search with spaCy

In order to execute my feature search and sentiment analysis project, I spent the week testing my first version of the feature search proper. While the last week was spent tagging data to evaluate the accuracy of the POS tagger, this week was the first week of proper feature classification on actual data. By doing this I have the first chunk of my project practically complete. Here’s what happened and how I did it.

Step 1: Choosing Relevant Data

Unlike the dataset from last week that was used to evaluate the POS tagger, this dataset is the real deal dataset of Yelp Reviews — that is all 4.72 gigabytes of data. The review JSON file in question is a massive series of dictionaries where each corresponds to a unique review. Each review dictionary contains several keys, which include the review id, user id, the business id, star rating date, the full review text and whether the review is useful, funny or cool.

As far as I’m concerned, we only needed the business id and full review text for now. The JSON file’s dictionaries are organized by review id and not business id so I trawled the entire file to collect all reviews corresponding to a business.

For each business, I would process each sentence in a review by saving the text as either raw text, POS tagged text or word dependency tagged text. Then I would make each raw or transformed sentence an individual list of a sentence’s words and then append all sentence lists to a master list. This master list would then be stored in a dictionary with a key corresponding to the type of text it was. For ease of access and to save the results I then wrote each set of data to a text file named with the business id.

Step 2: Processing Problematic POS

After collecting the data suitable for processing I started using spaCy. SpaCy is straightward in that you just input your raw text into the function nlp() and it will return a sequence of token objects. For each word or token in a sentence or series of sentences, it will return attributes like token.text, token.pos etc… The only attributes we care about are token.pos and token.dep which are respectively, the part of speech tag and word dependency for any given word.

In this case we only care about the POS tags denoting a noun, a proper noun and adjective. We’ve sharply reduced the amount of POS tags from when we evaluated the POS tagger because right now we need to focus on prototyping the feature search part of my project. (That means identifying just the proper noun, noun, and the adjective in a given sentence.)

However just identifying a noun, proper noun and adjective will immediately run into the issue of multiple nouns and adjectives in a given sentence. In a scenario with multiple parts, how can we tell what noun is the main topic being described?

While we know that veal is most likely being described as tender, we can’t tell if juicy is describing the veal or the lemon. Furthermore, while to human readers it might seem obvious that a sentence from a restaurant review like this would most likely have the veal as the main subject/noun of the sentence, both veal and lemon are labeled as nouns. Since they have the same nouns then they carry equal weight, so as far as the computer is concerned, the noun or the “subject” of that sentence is both veal and lemons.

Furthermore, the order in which the adjectives occur can vary greatly and cause problems with our feature search. Adjectives can occur before or after the nouns they’re describing. This is true for the example above as well, as the actual sentence may have been describing the veal as both “tender” and “juicy” and not describing the lemon as “juicy”. Thus is is clear we need something else to tie adjectives to their appropriate nouns.

That’s why we use word dependencies. Word dependencies describe the relationships or usage of words within a sentence by using specific linguistic tags like ‘adjectival modifier’. These dependencies go one step beyond just basic POS tags like ‘adjectives’ and are a greater benefit to overall understanding. Take the same sentence containing veal and lemon but now run it through the word dependency tagger. We see that it returns the following.

This is what the word dependency tagger returns

But what if you’re like me and haven’t seriously studied grammar and linguistics? However, as we’ll see shortly, that’s okay.

Step 3: Detecting actual relationships

Not only are there a lot of dependency tags that can describe the relationships between nouns and adjectives, understanding each dependency tag can be very time consuming. So maybe we can get away with just figuring out the most relevant dependency tag by using the spaCy dependency tree visualizer.

For the full sentence, “The pizza was delicious and the veal was tender, juicy and not overpowered with lemon or capers” we get a partial result like this. The following was displayed onscreen with the word dependency visualizer displaCy.

The word dependency tags relating to veal.
The word dependency tags relating to lemon.

From this example we can clearly see that the two adjectives from our previous example were describing the veal and labeled with “acomp”. Thus in the future when we iterate through a similar sentence, we can have a conditional statement that selects a sequence of words tagged with “nsubj” and “acomp”.

Although we only examined the word dependencies using displaCy for one example sentence, we will need to do this several times for different sentences. Only then can we identify enough word dependencies and the order in which they occur that captures the variety of ways adjectives modify nouns. (e.g what if a word tagged “acomp” that was an adjective came before a noun it was describing?)

Next Steps

I still need to label some of the review data with the actual noun and adjective that they’re describing. I need to do this to evaluate the efficacy of my model utilizing the word dependencies I’ve just explained. However, once I’ve done that I can attach a sentiment analyzer and rate my adjectives positively or negatively and then have a full prototype of my project!

Woohoo!

Add a comment

Related posts:

Building the Dream

In the last installment, we’d settled on our dream home design, and were almost ready to start building. The next step was to get modified drawings and secure planning permission. First we had to…

OcNet AirDrop Campaign is Alive!

OcNet is an Decentralized Network Access Ecosystem which provides dAPPs full service capacity includes extremely fast transmission, high volume storage, secure payment, and privacy protection. To…

Error Handling in React Native Apps using Apollo GraphQL

If you are writing a React native Application and want to interface a GraphQL backend, you probably came across Apollo and Relay. For my own project I chose Apollo for its flexibility and lightweight…